Microsoft Fabric Updates Blog

Enhancing data quality with Copilot for Data Factory

In today’s world, cleaning and transforming data is incredibly important. Clean data means removing errors, duplicates, and irrelevant information to ensure its correct and complete, making the data more trustworthy for analysis. Transforming data means changing it into a format that is easy to use and analyze. Together, these steps improve data quality, leading to better insights and smarter decisions. Without clean and well-organized data, businesses might make poor choices based on faulty information, leading to missed opportunities. Investing in good data cleaning and transformation processes is crucial for any business that wants to use data to its advantage.

Cleaning and transforming data ensure accuracy and usability, enhancing decision-making. Reliable data leads to better insights and avoids missed opportunities in business.

Using Copilot for Data Factory to clean and transform data

Contoso Retailers, a fictitious retail company, is dedicated to applying data analysis techniques on their data to track sales trends and gain insights to implement new strategies based on those trends. They aim to make sure that they enhance the data quality by cleaning and transforming their data to prepare it for analysis, machine learning model training and visualization to gain insights on employee and sales performance over a period of time.

To achieve this, you can use Copilot for Data Factory to assist you in ingesting data, create new transformations on existing queries to add new columns for further analysis, clean data to remove empty values and create new queries with new synthetic data.

1 – Update existing queries to clean & transform data with Copilot

Before you can clean and transform data, you need to create a Dataflow Gen 2 that will assist you in ingesting your data. You can learn more about Dataflows and how to ingest data by referring to this training.

  • Inside of your Dataflow Gen2, you can use the Copilot panel to understand an existing query to give you better idea of how to transform your data.
  • The requirements for this scenario are to get a better analysis of the sales performance. As an initial step you can use Copilot to enable your team to get the revenue generated by the sales so they can build better models and visualizations.

Add a column ‘Gross Revenue’ that is a product of ‘UnitPrice’ and ‘OrderQty’, the result is rounded to two decimal places.

Figure 1 – Adding a gross revenue column using Copilot
  • For sales analysis, you can take it a step further by adding column to get the monetary value of the discount from each sale.

Example prompt: Add a column ‘Discount Value’ that is a product of ‘Gross Revenue’ and ‘UnitPriceDiscount’, the result is rounded to two decimal places.

Note: It’s important to transform your data based on your scenario and requirements.

  • Within a sales scenario you can also add other columns like Net Revenue and Days to ship which can allows to make better business decisions for the impact the time it takes to ship a product has on the revenue generated.

2 – Create new queries and generate synthetic data

When using Copilot for Data Factory or any other Copilots, it is important to remember it is Copilot not Autopilot which means you still have control over how you transform your data. For this requirement, we manually create a new query to update it later.

  • For this sales scenario, you need a dimension date table to help slice the data based on Fiscal Year, Quarter and Month. Copilot can assist you to generate synthetic dates based on the range given.

Example prompt: Add a step to the query to add date values to the ‘DateKey’ column starting from 1/1/2012 to 12/31/2013.

  • To get a better performance from Copilot with the date values you can split the new synthetic date data into columns like day, month, month name and year.
  • In this scenario, Contoso Retailers fiscal year starts on July of every year. Using Copilot, you can be able to identify the start of the Fiscal Year for better analysis when looking at sales performance.

Example prompt: Add 1 to ‘Year’ if the ‘Date’ month is greater than or equal to 7 and keep as is if the ‘Date’ month is less than or equal to 6. Store the result in a new column ‘Fiscal Year’ and take the last two digits of the result combine them with the prefix ‘FY’.

  • Finally, you can enable your team to further slice the sales performance data by Fiscal Year and the quarter for each year. This will allow Contoso retailers to compare the performance of FY12 Q1 and FY13 Q1 to make better data driven decisions. Copilot can assist you in adding a Quarter column added.

Example prompt: Add a column ‘Quarter’ with values as ‘Q1’ when the ‘Month’ is equal to 7 or 8 or 9, ‘Q2’ when the ‘Month’ is equal to 10 or 11 or 12, ‘Q3’ when the ‘Month’ is equal to 1 or 2 or 3, ‘Q4’ when the ‘Month’ is equal to 4 or 5 or 6.

Figure 2 – Complete dimension date table

There’s more that you can do for a retail/sales scenario with Copilot for Data Factory, we provided a baseline of how you would approach data transformations and cleaning using Copilot. This can be applied to other scenarios; you would have to be mindful what transformations your scenario requires.

You can find a deep dive of using Copilot for Data Factory in our Copilot Learning Hub for Data professional’s tutorial.

Resources

Start your Copilot learning journey

Copilot for Data Factory overview

Data Factory overview

Related blog posts

Enhancing data quality with Copilot for Data Factory

November 24, 2025 by Jianlei Shen

This milestone marks a major step forward in unifying and simplifying data movement experiences across Data Factory. With Copy Job Activity, users can now enjoy the simplicity and speed of Copy Job while leveraging the orchestration power and flexibility of Data Factory pipelines. What is the Copy job Activity  Copy Job Activity allows you to … Continue reading “Announcing Copy Job Activity in Data Factory Pipeline (Generally Available)”

November 21, 2025 by Penny Zhou

Troubleshooting pipeline failures can be overwhelming, especially when a single run throws dozens or even hundreds of errors. The new Error Insights Copilot in Fabric makes this process faster, smarter, and easier. Powered by AI, Copilot provides clear explanations, root cause analysis, and actionable recommendations, so you can resolve issues without getting lost in technical … Continue reading “AI-powered troubleshooting for Fabric pipeline error messages”