How to Use OpenAI for Big Data with SynapseML in Microsoft Fabric

Big Data with SynapseML

Big Data with SynapseML in Microsoft Fabric : In the era of big data, extracting valuable insights from large datasets is a common challenge. OpenAI, a leading AI research organization, provides advanced artificial intelligence technologies that can help uncover hidden trends in your business data. On the other hand, Microsoft Fabric and SynapseML provide a powerful set of tools and technologies that enable seamless integration and analysis of data. In this blog post, we will explore how to leverage OpenAI for big data with SynapseML in Microsoft Fabric.

OpenAI for Big Data

OpenAI provides a range of tools and resources for handling large datasets. One of the key offerings is the Azure OpenAI service, which can be used to solve a large number of natural language tasks by prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, Azure OpenAI service has been integrated with the distributed machine learning library SynapseML. This integration makes it easy to use the Apache Spark distributed computing framework to process millions of prompts with the OpenAI service.

Mastering the Art of UI Design: A Guide to Using Microsoft Fabric with React

SynapseML in Microsoft Fabric

SynapseML is an ecosystem of tools aimed at expanding the distributed computing framework Apache Spark in several new directions. It adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK), LightGBM and OpenCV. SynapseML is preinstalled on Fabric, and if you want to use another version, you can install it with %%configure.

Mastering the Art of UI Design: A Guide to Using Microsoft Fabric with React

Combining OpenAI and SynapseML in Microsoft Fabric

The combination of OpenAI and SynapseML in Microsoft Fabric allows for the application of large language models at a distributed scale. This tutorial shows how to apply large language models at a distributed scale using Azure Open AI and Azure Synapse Analytics.

Step 1: Prerequisites

The key prerequisites for this quickstart include a working Azure OpenAI resource, and an Apache Spark cluster with SynapseML installed.

Step 2: Import this guide as a notebook

The next step is to add this code into your Spark cluster.

Step 3: Fill in your service information

Next, edit the cell in the notebook to point to your service.

Step 4: Create a dataset of prompts

Finally, create a dataset of prompts.


Here are some frequently asked questions related to OpenAI for big data and SynapseML in Microsoft Fabric:

  1. Does Azure OpenAI work with the latest Python library released by OpenAI (version>=1.0)?
    • Azure OpenAI is supported by the latest release of the OpenAI Python library (version>=1.0). However, it is important to note migration of your codebase using openai migrate is not supported and will not work with code that targets Azure OpenAI.
  2. Does Azure OpenAI support GPT-4?
    • Azure OpenAI supports the latest GPT-4 models. It supports both GPT-4 and GPT-4-32K.
  3. How do the capabilities of Azure OpenAI compare to OpenAI?
    • Azure OpenAI Service gives customers advanced language AI with OpenAI GPT-3, Codex, and DALL-E models with the security and enterprise promise of Azure.
  4. Does Azure OpenAI support VNETs and Private Endpoints?
    • Yes, as part of Azure AI services, Azure OpenAI supports VNETs and Private Endpoints.
  5. Do the GPT-4 models currently support image input?
    • No, GPT-4 is designed by OpenAI to be multimodal, but currently only text input and output are supported.

For more detailed information, please refer to the Azure OpenAI Service FAQ.


The integration of OpenAI for big data with SynapseML in Microsoft Fabric provides a powerful tool for data scientists and developers to extract valuable insights from large datasets. By leveraging the capabilities of these advanced technologies, you can uncover hidden trends in your business data and make more informed decisions. Happy data analyzing!