Data in every organization is growing faster than ever. The recent innovations in Generative AI like Azure Open AI service is creating new opportunities for organizations to gain a competitive edge. But CIOs and data analytics leaders in all organizations are faced with the challenge of having to use multiple data products to get their data ready for use by AI models. Different teams and lines of businesses in organizations use their own technology for storing and processing data, which creates data silos. Data silos are one of the major obstacles to organizations being able to leverage their data for AI consumption. Even when organizations decide to use single cloud provider like Microsoft Azure and use data services like Azure Synapse Analytics, Azure Data Factory, Azure Databricks, Azure Stream Analytics, and Power BI to simplify their data analytics processes and solve data integration challenges, the complexity of having to manage different Platform as a Service (PaaS) and Software as a Service (SaaS) is still complex and challenging.
Microsoft Fabric is a Software as a Service (SaaS) that provides end to end analytics capability from data ingestion, data engineering, data warehousing, data science, real time analytics, data event driven programming automation and data visualization all in one service with seamless integration between them. Microsoft Fabric has a single storage layer called OneLake. It has a single purchase model called Capacity Unit (CU). So, users don’t have to worry about managing different storage for different compute and different billing for each service.
Components of Microsoft Fabric
Below are different components of Microsoft Fabric that make the end-to-end analytics possible.
OneLake
OneLake in Microsoft Fabric is designed to be the single storage for all that data an organization intends to use for analytics. It works like OneDrive in Microsoft 365. An admin can certify data in OneLake and make it available to different teams within organization for use in analytics projects.
Microsoft Purview
Microsoft Purview provides data governance features in Microsoft Fabric like data lineage and data classification.
Data Factory
Data Factory is Microsoft Fabric provides data ingestion functionality.
Dataflow Gen2
Dataflow Gen2 provides UI driven way to transform data. This is ideal for medium size data volume.
Data Pipeline
Data Pipeline is used to do the actual data movement from one location to another. Data Pipeline is also a UI based utility that can use Dataflow Gen2 as a component for doing data transformation during data movement.
Data Engineering
Data Engineering in Microsoft fabric has tools for doing data transformation and feature engineering to get data ready for analytics.
Lakehouse
Lakehouse in Microsoft Fabric is a storage container that stores flat files and tables created as Delta Table. Delta Tables store data in delta format that adds relational table semantics to Delta Tables like doing Create, Read, Update and Delete (CRUD) operations, being Atomicity, Consistency, Isolation and Durability (ACID) compliant to support transactions and support versioning and time travel for tables. Besides that, Delta Tables also support both batch and streaming API of Spark. Since Delta Tables store data in parquet format, the data is easily interoperable between different data processing systems.
Notebook
Notebooks allow users to write spark code to do data ingestion, transformation and any kind of processing including Data Science and Machine Learning. For large volumes of data ingestion, it is recommended to use Notebook rather than Data Pipeline and Dataflow Gen2.
Data Science
Data Science in Microsoft Fabric allows users to create Data Science and Machine Learning models.
ML Model
ML Models in Microsoft Fabric store Machine Learning models.
Experiment
Experiment in Microsoft Fabric contain the Machine Learning training and modeling.
Models and experiments can be created using Notebooks. Microsoft Fabric also supports creating custom environments with specific versions of libraries to further customize the model building process.
Data Warehouse
Data Warehouse in Microsoft Fabric allows users to create Warehouse for relational data warehouse reporting and analytics.
Data Activator
Data Activator in Microsoft Fabric allows users to create event driven programming that generates action based on custom event criteria. The component used to create action on event is called Reflex.
Real-Time Analytics
Real-Time Analytics in Microsoft Fabric allows users to perform analytics on streaming data.
KQL Database
Data in Microsoft Fabric Real-Time Analytics is stored in Kusto Query Language (KQL) Database, that can be queried using Kusto Query Language.
KQL Queryset
KQL Database queries can be saved as KQL Queryset in Microsoft Fabric
Eventstream
Eventsteam in Microsoft Fabric Real-Time Analytics is a no code UI driven platform to capture data from Azure EventHub or Azure IOT Huband store in Lakehouse or KQL Database. It also allows the creation of Reflex to take action on specific events.
Power BI
Power BI in Microsoft Fabric provides Business Intelligence and data visualization in Microsoft Fabric through its components like Reports, Paginated Reports, Dashboard, Streaming Dataset, and Streaming Dataflow
You can watch my YouTube video on Complete walkthrough of Microsoft Fabric.