Data Factory Announcements at Fabric Community Conference Recap
Last week was such an exciting week for Fabric during the Fabric Community Conference, filled with several product announcements and sneak previews of upcoming new features.
Thanks to all of you who participated in the conference, either in person or by being part of the many virtual conversations through blogs, Community forums, social media and other channels. Thank you also for all your product feedback and Ideas forum suggestions that help us defining the next wave of product enhancements.
We wanted to make sure you didn’t miss any of the Data Factory in Fabric announcements, by providing you with this recap of all new features.
New Data Pipelines capabilities
- Data Pipelines accessing on-premises data using the On-premises data gateway [Announcement]
- CI/CD support for Data Pipelines [Announcement]
- Data Pipelines activity limit increased from 40 to 80 activities [Announcement]
- Public APIs for Data Pipelines [Announcement]
- Semantic Model Refresh activity for Data Pipelines [Announcement]
- Unity Catalog support in Azure Databricks activity [Announcement]
- Improved Performance tuning tips experience [Announcement]
New Dataflows Gen2 capabilities
- Fast Copy [Announcement]
- Output destinations support for schema changes for Lakehouse & Azure SQL database [Announcement]
- Cancel dataflow refresh [Announcement]
- Privacy Levels support [Announcement]
- Manage connections experience enhancements [Announcement]
- Test Framework for Custom Connectors SDK in VS Code [Announcement]
New Get Data & Authentication capabilities
- Modern Get Data UX to browse Azure resources [Announcement]
- Azure Service Principal (SPN) support for on-premises and VNET data gateways [Announcement]
- Block sharing of Shareable Cloud Connections at tenant level [Announcement]
- VNET Data Gateway is generally available [Announcement]
- Mirroring for Azure SQL DB, Cosmos DB and Snowflake in Fabric [Announcement]
You can continue reading below for more information about each of these capabilities.
Data Pipelines accessing on-premises data
We are thrilled to announce the public preview of on-premises connectivity for Data pipelines in Microsoft Fabric.
Using the On-premises Data Gateway, customers can now connect to on-premises data sources using data pipelines with Data Factory in Microsoft Fabric. This enhancement significantly broadens the scope of data integration capabilities. In essence, by using an on-premises Data Gateway, organizations can keep databases and other data sources on their on-premises networks while securely integrating them and orchestrating them using data pipelines in Microsoft Fabric.
Check out the following resources to help you get started:
- How to access on-premises data sources in Data Factory – Microsoft Fabric | Microsoft Learn
- Data pipeline connectors in Microsoft Fabric – Microsoft Fabric | Microsoft Learn
CI/CD support for Data Pipelines
When building successful data analytics projects, it is very important to have source control, continuous integration, continuous deployment, and collaborative development environments. Many Fabric engineers with previous Azure Synapse Analytics and Azure Data Factory experience have utilized the Git integration included in those PaaS offerings for those important capabilities. Now, we’re excited to share that we have added Git Integration and integration with built-in Deployment Pipelines to Data Factory data pipelines in Fabric as a public preview!
CI/CD features to utilize your own Git repo in Azure DevOps, or to use the built-in Deployment Pipelines in Fabric, will eventually light-up and become available to all Fabric items. Now that data pipelines can be used with these features, read more about the current preview capabilities and limitations at the online documentation here.
Learn more about this enhancement here: REST APIs for Fabric Data Factory pipelines now available | Microsoft Fabric Blog | Microsoft Fabric
Public APIs for Data Pipelines
REST APIs for CRUD operations on Fabric Data Factory are now available as public preview.
The ability to execute and create pipelines using a REST endpoint is a very important feature that we have enabled for Fabric Data Factory that many our Azure Data Factory (ADF) customers have utilized for very powerful patterns over the years. Now that the public REST APIs have been published, you can automate pipeline creation, management and execution as well as execute pipelines via REST endpoints in other pipelines via Web activity.
To get started using the create, read, update, delete, list operations from the new REST API please see our online documentation here: Fabric data pipeline public REST API (Preview) – Microsoft Fabric | Microsoft Learn
Semantic Model Refresh activity for Data Pipelines
We are excited to announce the availability of the Semantic Model Refresh activity for data pipelines. With this new activity, you will be able to create connections to your Power BI semantic model datasets and refresh them.
To learn more about this activity, read https://aka.ms/SemanticModelRefreshActivity
Unity Catalog support in Azure Databricks activity
We are excited to announce that Unity Catalog support for Databricks Activity is now supported. With this update, you will now be able to configure your Unity Catalog Access Mode for added data security.
Find this update under Additional cluster settings.
For more information about this activity, read https://aka.ms/AzureDatabricksActivity.
Improved “Performance tuning tips” experience
The more intuitive user experience and more insightful performance tuning tips are available in Data Factory data pipelines. These tips will provide useful and accurate advice regarding staging, degree of copy parallelism settings, etc. to optimize your pipeline performance.
Fast Copy
Dataflows help with ingesting and transforming data. With the introduction of Dataflow Gen2 High-Scale Data Transformations, we are able to transform your data at scale. However, to do this at high scale, your data needs to be ingested first.
With the introduction of Fast Copy, you can ingest terabytes of data with the easy experience of dataflows, but with the scalable backend and high throughput of Pipeline’s Copy activity.
As part of the initial release of Fast Copy, we support Azure Data Lake Storage Gen2, Azure Blob Storage, Azure SQL Database, Azure PostgreSQL and Fabric Lakehouse as sources. We will continue expanding the breadth of Fast Copy sources in future updates.
You can learn more about Dataflows Fast Copy here: Fast copy in Dataflows Gen2
Output destinations support for schema changes for Lakehouse & Azure SQL database
One of the most requested enhancements for Output Destinations in Dataflows Gen2 has been having the ability to modify the schema of the destination table based on the schema of the latest results from your dataflow evaluations.
When loading into a new table, by default the automatic settings are on. Using automatic settings, dataflows Gen 2 manages the mapping for you. This will allow you the following behavior:
- Update method replace: Data will be replaced at every dataflow refresh. Any data in the destination will be removed. The data in the destination will be replaced with the output data of the dataflow.
- Managed mapping: Mapping is managed for you. When you need to make changes to your data/query to add an additional column or change a data type, mapping is automatically adjusted for this when you republish your dataflow. You do not have to go into the data destination experience every time you make changes to your dataflow, allowing you for easy schema changes when you republish the dataflow.
- Drop and recreate table: To allow for these schema changes, on every dataflow refresh, the table will be dropped and recreated. Note that your dataflow refresh will fail if you have any relationships or measures depending on your table.
Manual settings
By un-toggling the use automatic setting, you get full control over how to load your data into the data destination. You can make any changes to the column mapping by changing the source type or excluding any column that you do not need in your data destination.
Cancel dataflow refresh
Canceling a dataflow refresh is useful when you want to stop a refresh during peak time, if a capacity is nearing its limits, or if refresh is taking longer than expected. Use the refresh cancellation feature to stop refreshing dataflows.
To cancel a dataflow refresh, select Cancel refresh option found in workspace list or lineage views for a dataflow with in-progress refresh.
Once a dataflow refresh is canceled, the dataflow’s refresh history status is updated to reflect cancelation status.
Privacy Levels support
You can now set privacy levels for your connections in Dataflows. Privacy levels are critical to configure correctly so that sensitive data is only viewed by authorized users.
Furthermore, data sources must also be isolated from other data sources so that combining data has no undesirable data transfer impact. Incorrectly setting privacy levels may lead to sensitive data being leaked outside of a trusted environment. You can set this privacy level when creating a new connection:
Learn more about Privacy Levels in this article: Behind the scenes of the Data Privacy Firewall – Power Query | Microsoft Learn
Manage Connections experience enhancements
Manage connections is a feature that allows you to see at-a-glance the connections that you have in use for your Dataflows and the general information about those connections.
We are happy to release a new enhancement to this experience where now you can see a list of all the data sources available in your Dataflow: even the ones without a connection set for them!
For the data sources without a connection, you can set a new connection from within the manage connections experience by clicking the plus sign in the specific row of your source.
Furthermore, whenever you unlink a connection now the data source will not disappear from this list if it still exists in your Dataflow definition. It will simply appear as a data source without a connection set until you can link a connection either in this dialog or throughout the Power Query editor experience.
Test Framework for Custom Connectors SDK in VS Code
We’re excited to announce the availability of a new Test Framework in the latest release of Power Query SDK! The Test Framework allows Power Query SDK Developers to have access to standard tests and a test harness to verify the direct query (DQ) capabilities of an extension connector. With this new capability, developers will have a standard way of verifying connectors and a platform for adding additional custom tests. We envision this as the first step in enhancing the developer workflow with increased flexibility & productivity in terms of the testing capabilities provided by the Power Query SDK.
The Power Query SDK Test Framework is available on Github. It would need the latest release of Power Query SDK which wraps the Microsoft.PowerQuery.SdkTools NuGet package containing the PQTest compare command.
What is the Power Query SDK Test Framework?
Power Query SDK Test Framework is a ready-to-go test harness with pre-built tests to standardize the testing of new and existing extension connectors by providing ability to test functional, compliance and regression testing that can be extended to perform testing-at-scale. It will help address the need for a comprehensive test framework to satisfy the testing needs of extension connectors.
Follow the links below to get started:
- Power Query SDK overview
- Create your first Power Query custom connector
- Get started with the new Test Framework for the Power Query SDK
Modern Get Data UX to browse Azure resources
Using the regular path in Get Data to create a new connection, you always need to fill in your endpoint, URL or server and database name when connecting to Azure resources like Azure Blob, Azure Data Lake gen 2 and Synapse. This is a bit of a tedious process and does not allow for easy data discovery.
With the new ‘browse Azure’ functionality in Get Data, you can easily browse all your Azure resources and automatically connect to them, without going through manually setting up a connection, saving you a lot of time.
Azure Service Principal (SPN) support for on-premises and VNET data gateways
You can now authenticate your on-premises and VNET data gateway connections using SPNs. Learn more about SPN in Data Factory.
Azure service principal (SPN) is a security identity that’s application based and can be assigned permissions to access your data sources. Service principals are used to safely connect to data, without a user identity.
Within Microsoft Fabric, service principal authentication is supported in Semantic Models, dataflows (both Dataflow Gen1 and Dataflow Gen2), and Datamarts.
Block sharing of Shareable Cloud Connections at tenant level
By default, any user in Fabric can share their connections if they have the following user role on the connection:
- Connection owner or admin
- Connection user with sharing
Sharing a connection in Fabric is sometimes needed for collaboration within the same workload or when sharing the workload with others. Connection sharing in Fabric makes this easy by providing a secure way to share connections with others for collaboration, but without exposing the secrets at any time. These connections can only be used within the Fabric environment.
If your organization does not allow connection sharing or wants to limit the sharing of connections, a tenant admin can restrict sharing as a tenant policy. The policy allows you to block sharing within the entire tenant.
General Availability of VNET Data Gateway
The VNET Data Gateway is a network security offer that lets you connect your Azure and other data services to Microsoft Fabric and the Power Platform. You can run Dataflow Gen2, Power BI Semantic Models, Power Platform Dataflows, and Power BI Paginated Reports on top of a VNET Data Gateway to ensure that no traffic is exposed to public endpoints. In addition, you can force all traffic to your data source to go through a gateway, allowing for comprehensive auditing of secure data sources.
To learn more and get started, read this article: VNET Data Gateways.
Mirroring for Azure SQL DB, Cosmos DB and Snowflake in Fabric
We are excited to announce that Mirroring, previously announced at Ignite in November 2023, is now available to customers in Public Preview. You can now seamlessly bring your databases into OneLake in Microsoft Fabric, enabling seamless zero-ETL, near real-time insights on your data – and unlocking warehousing, BI, AI, and more.
Data driven insights are important for every business. With the critical need to make smart decisions, create new things, improve your products or services – time to value is everything. Yet, this can be difficult when you have data in different places, like apps, databases, and data warehouses. These places typically store data differently, so you can’t easily analyze and cross reference them – you have to laboriously move their data to a place where you can analyze and harmonize at scale. Doing this takes time, money, and typically, costly expertise to build complex, connected solutions. By the time you do this your data is old, and your insights are out of date. Decision makers need to be able to ask questions about their data, without time consuming complexity that adds risk and can impact mission critical workloads.
Mirroring simplifies this process into clicks and seconds, not complex processes and hours, days, or weeks. You get a modern, fast, and safe way of accessing and ingesting data continuously and seamlessly from databases or data warehouses into Fabric’s OneLake, without the need for cumbersome pipelines – in near real time. Combined with the rest of your organization’s data in OneLake, you can quickly unify and govern your data estate, removing data silos.
As part of the initial Public Preview launch, Azure Cosmos DB, Azure SQL DB, and Snowflake customers on any cloud are able to mirror their data in OneLake and unlock all the capabilities of Fabric’s Data Warehouse, Direct Lake Mode in Power BI, Notebooks and much more. Besides the support for Azure Cosmos DB, Azure SQL Database, Snowflake in Mirroring, many more data sources will be added in the future based on your feedback.
Learn more about Mirroring in Fabric by reading this article: Mirroring – Microsoft Fabric | Microsoft Learn
Thank You for your feedback, keep it coming!
We wanted to thank you for your support, usage, excitement, and feedback around Data Factory in Fabric. We’re very excited to continue learning from you regarding your Data Integration needs and how Data Factory in Fabric can be enhanced to empower you to achieve more with data.
Please continue to share your feedback and feature ideas with us via our official Community channels, and stay tuned to our public roadmap page for updates on what will come next: