Microsoft Fabric Updates Blog

Enhancing data quality with Copilot for Data Factory

In today’s world, cleaning and transforming data is incredibly important. Clean data means removing errors, duplicates, and irrelevant information to ensure its correct and complete, making the data more trustworthy for analysis. Transforming data means changing it into a format that is easy to use and analyze. Together, these steps improve data quality, leading to better insights and smarter decisions. Without clean and well-organized data, businesses might make poor choices based on faulty information, leading to missed opportunities. Investing in good data cleaning and transformation processes is crucial for any business that wants to use data to its advantage.

Cleaning and transforming data ensure accuracy and usability, enhancing decision-making. Reliable data leads to better insights and avoids missed opportunities in business.

Using Copilot for Data Factory to clean and transform data

Contoso Retailers, a fictitious retail company, is dedicated to applying data analysis techniques on their data to track sales trends and gain insights to implement new strategies based on those trends. They aim to make sure that they enhance the data quality by cleaning and transforming their data to prepare it for analysis, machine learning model training and visualization to gain insights on employee and sales performance over a period of time.

To achieve this, you can use Copilot for Data Factory to assist you in ingesting data, create new transformations on existing queries to add new columns for further analysis, clean data to remove empty values and create new queries with new synthetic data.

1 – Update existing queries to clean & transform data with Copilot

Before you can clean and transform data, you need to create a Dataflow Gen 2 that will assist you in ingesting your data. You can learn more about Dataflows and how to ingest data by referring to this training.

  • Inside of your Dataflow Gen2, you can use the Copilot panel to understand an existing query to give you better idea of how to transform your data.
  • The requirements for this scenario are to get a better analysis of the sales performance. As an initial step you can use Copilot to enable your team to get the revenue generated by the sales so they can build better models and visualizations.

Add a column ‘Gross Revenue’ that is a product of ‘UnitPrice’ and ‘OrderQty’, the result is rounded to two decimal places.

Figure 1 – Adding a gross revenue column using Copilot
  • For sales analysis, you can take it a step further by adding column to get the monetary value of the discount from each sale.

Example prompt: Add a column ‘Discount Value’ that is a product of ‘Gross Revenue’ and ‘UnitPriceDiscount’, the result is rounded to two decimal places.

Note: It’s important to transform your data based on your scenario and requirements.

  • Within a sales scenario you can also add other columns like Net Revenue and Days to ship which can allows to make better business decisions for the impact the time it takes to ship a product has on the revenue generated.

2 – Create new queries and generate synthetic data

When using Copilot for Data Factory or any other Copilots, it is important to remember it is Copilot not Autopilot which means you still have control over how you transform your data. For this requirement, we manually create a new query to update it later.

  • For this sales scenario, you need a dimension date table to help slice the data based on Fiscal Year, Quarter and Month. Copilot can assist you to generate synthetic dates based on the range given.

Example prompt: Add a step to the query to add date values to the ‘DateKey’ column starting from 1/1/2012 to 12/31/2013.

  • To get a better performance from Copilot with the date values you can split the new synthetic date data into columns like day, month, month name and year.
  • In this scenario, Contoso Retailers fiscal year starts on July of every year. Using Copilot, you can be able to identify the start of the Fiscal Year for better analysis when looking at sales performance.

Example prompt: Add 1 to ‘Year’ if the ‘Date’ month is greater than or equal to 7 and keep as is if the ‘Date’ month is less than or equal to 6. Store the result in a new column ‘Fiscal Year’ and take the last two digits of the result combine them with the prefix ‘FY’.

  • Finally, you can enable your team to further slice the sales performance data by Fiscal Year and the quarter for each year. This will allow Contoso retailers to compare the performance of FY12 Q1 and FY13 Q1 to make better data driven decisions. Copilot can assist you in adding a Quarter column added.

Example prompt: Add a column ‘Quarter’ with values as ‘Q1’ when the ‘Month’ is equal to 7 or 8 or 9, ‘Q2’ when the ‘Month’ is equal to 10 or 11 or 12, ‘Q3’ when the ‘Month’ is equal to 1 or 2 or 3, ‘Q4’ when the ‘Month’ is equal to 4 or 5 or 6.

Figure 2 – Complete dimension date table

There’s more that you can do for a retail/sales scenario with Copilot for Data Factory, we provided a baseline of how you would approach data transformations and cleaning using Copilot. This can be applied to other scenarios; you would have to be mindful what transformations your scenario requires.

You can find a deep dive of using Copilot for Data Factory in our Copilot Learning Hub for Data professional’s tutorial.

Resources

Start your Copilot learning journey

Copilot for Data Factory overview

Data Factory overview

Related blog posts

Enhancing data quality with Copilot for Data Factory

November 5, 2025 by Pradeep Srikakolapu

In our earlier announcement, we shared that newly created data warehouses, lakehouses and other items in Microsoft Fabric would no longer automatically generate default semantic models. This change allows customers to have more control over their modeling experience and to explicitly choose when and how to create semantic models. Starting end of October 2025, Microsoft … Continue reading “Decoupling Default Semantic Models for Existing Items in Microsoft Fabric”

October 29, 2025 by Ye Xu

Copy job is the go-to solution in Microsoft Fabric Data Factory for simplified data movement, whether you’re moving data across clouds, from on-premises systems, or between services. With native support for multiple delivery styles, including bulk copy, incremental copy, and change data capture (CDC) replication, Copy job offers the flexibility to handle a wide range … Continue reading “Simplifying Data Ingestion with Copy job – More File Formats with Enhancements”