Azure

Azure Databricks

What is Azure Databricks?




    Azure Databricks is cloud service that let us setup and use a cluster of Azure instances with Apache Spark, installed, with a Master-Worker nodal dynamic computing. 

Workspace

    Workspace is an environment for accessing all Azure Databricks assets. It organizes objects  like notebooks, libraries, dashboards, & experiments  into folders and provide access to data objects and computational resources.

Objects contained in the Azure Databricks folders are:
  • Notebook
  • Dashboard
  • Library
  • Experiment

Notebook

    It is web-based interface to documents that contain runnable commands, visualizations, and narrative text.

Dashboard

    An interface that provides organized access to visualizations.

Library

    It is a package of code available to the Notebook or job running on your cluster. Databricks runtimes include many libraries and you can add your own.

Experiment

    A collection of MLflow runs for training a machine learning model.


Azure Databricks provides the latest versions of Apache Spark and it allows you to seamlessly integrate with open source libraries. It operates out of a Control plane and Data plane

Control plane includes backend services that is managed by Azure Databricks in its own Azure account. Notebook commands & many other workspace configurations are stored in Control plane & encrypted at rest.

Data plane is managed by Azure account where data resides i.e. where your data is processed. The Azure Databricks connectors used so that the clusters can connect to external data sources outside the Azure account to ingest data or for storage. We can also ingest data from external streaming data sources, such as streaming data, events data, IoT data & more.





Comments