How to use OneLake in Microsoft Fabric to store and access data across multiple analytical engines

OneLake in Microsoft Fabric : Microsoft Fabric is a new all-in-one analytics solution that aims to simplify and unify data management, data engineering, data science, real-time analytics, and business intelligence. It is built on a lakehouse architecture that leverages the power of Delta Lake and OneLake, a cloud-native storage layer that integrates with Microsoft 365 apps. Microsoft Fabric also offers a seamless and integrated user experience that brings together various components from Power BI, Azure Synapse, and Azure Data Factory.

OneLake is the core of Microsoft Fabric. It is a single, unified, logical data lake for your whole organization. Like OneDrive, OneLake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data. OneLake brings customers:

  • One data lake for the entire organization
  • One copy of data for use with multiple analytical engines
  • One security model living natively with the data in the lake (coming soon)
  • A centralized OneLake data hub for data discovery and management

In this blog post, we will show you how to use OneLake in Microsoft Fabric to store and access data across multiple analytical engines, such as SQL, Spark, Python, etc. We will also answer some of the common questions and provide some tips and best practices on how to get the most out of OneLake.

How to store data in OneLake?

OneLake is built on top of Azure Data Lake Storage (ADLS) Gen2 and can support any type of file, structured or unstructured. You can store data in OneLake using various methods, such as:

  • Importing data from external sources using tools such as Azure Data Factory or Databricks
  • Loading data from Azure Synapse Analytics using tools such as Spark or SQL
  • Creating data items in Microsoft Fabric using tools such as lakehouses or warehouses
  • Uploading files directly to OneLake using tools such as Azure Storage Explorer or OneDrive

OneLake stores all tabular data in Delta Parquet format, which is an open-source file format that supports ACID transactions, schema evolution, and efficient compression. Delta Parquet enables you to perform both batch and streaming analytics on structured and unstructured data.

OneLake also supports the same ADLS Gen2 APIs and SDKs to be compatible with existing ADLS Gen2 applications, including Azure Databricks.

How to access data in OneLake?

You can access data in OneLake using various analytical engines, such as:

  • SQL: You can use T-SQL to query data in OneLake using the Fabric SQL experience or any SQL client that supports ODBC or JDBC connections. You can also use Power BI to visualize and analyze data in OneLake using the new Direct Lake mode in the Analysis Services engine.
  • Spark: You can use Spark SQL or PySpark to query and process data in OneLake using the Fabric Data Engineering or Data Science experiences. You can also use Databricks or other Spark applications to access data in OneLake using the ADLS Gen2 APIs or SDKs.
  • Python: You can use Python libraries such as pandas or numpy to manipulate data in OneLake using the Fabric Data Science experience. You can also use Azure Machine Learning or other Python applications to access data in OneLake using the ADLS Gen2 APIs or SDKs.

You can access all data in OneLake through data items, which are logical representations of your data that provide tailored experiences for each persona. For example, a lakehouse is a data item that gives you a Spark developer experience over your data. A warehouse is a data item that gives you a SQL developer experience over your data.

You can create data items in Microsoft Fabric using tools such as lakehouses or warehouses. You can also import existing SQL scripts or views into Microsoft Fabric as data items.

You can reference any table or file in OneLake using the OneLake syntax, which is similar to SQL syntax but with some differences. For example, you can use ONELAKE.[folder].[table] to reference a table stored in a folder in OneLake. You can also use ONELAKE.[folder].[file] to reference a file stored in a folder in OneLake.

Why Every Business Needs to Utilize the Microsoft Purview Extension for Data Governance

What are some of the benefits and advantages of using OneLake?

OneLake offers many benefits and advantages over traditional data lakes or warehouses, such as:

  • Simplified and unified data management: You don’t have to worry about managing multiple storage accounts or resources for your analytics needs. You have one single place to store and access all your analytics data.
  • Improved collaboration and governance: You can easily share and reuse your data across different teams and projects without creating silos or duplicating data. You can also apply consistent security and compliance policies across your entire organization.
  • Enhanced performance and scalability: You can leverage the power of Delta Lake and ADLS Gen2 to perform fast and reliable analytics on large and complex data. You can also scale your analytical engines on demand without affecting your storage layer.
  • Seamless integration and interoperability: You can use the analytical engine of your choice to access and query data in OneLake without any data movement or transformation. You can also integrate with other Microsoft 365 apps or Azure services to enrich and extend your analytics capabilities.

What are some of the common questions and tips on using OneLake?

Here are some of the common questions and tips on using OneLake:

  • How do I get started with OneLake? To get started with OneLake, you need to create a Microsoft Fabric workspace and link it to your existing ADLS Gen2 account. You can then import or create data items in your workspace and access them using the Fabric experiences or other analytical engines. For more information, see Creating a lakehouse with OneLake.
  • How do I monitor and optimize my OneLake usage and costs? You can use the Fabric portal or the Azure portal to monitor and optimize your OneLake usage and costs. You can also use tools such as Azure Cost Management or Azure Advisor to track and reduce your spending. For more information, see Monitoring and optimizing OneLake usage and costs.
  • How do I secure and control access to my data in OneLake? You can use the Fabric portal or the Azure portal to secure and control access to your data in OneLake. You can also use tools such as Azure Active Directory or Azure Role-Based Access Control to manage user identities and permissions. For more information, see OneLake security.
  • How do I discover and manage my data in OneLake? You can use the Fabric portal or the Azure portal to discover and manage your data in OneLake. You can also use tools such as Azure Purview or Azure Data Catalog to catalog and classify your data assets. For more information, see OneLake data hub.

How to migrate from Azure Synapse Analytics to Microsoft Fabric with minimal disruption

Conclusion

OneLake is a single, unified, logical data lake for your whole organization that comes automatically with every Microsoft Fabric tenant. It enables you to store and access data across multiple analytical engines without any data movement or duplication. It also simplifies and unifies data management, collaboration, governance, performance, scalability, integration, and interoperability.

In this blog post, we have shown you how to use OneLake in Microsoft Fabric to store and access data across multiple analytical engines. We have also answered some of the common questions and provided some tips and best practices on how to get the most out of OneLake.