Enhancing data quality with Copilot for Data Factory
In today’s world, cleaning and transforming data is incredibly important. Clean data means removing errors, duplicates, and irrelevant information to ensure its correct and complete, making the data more trustworthy for analysis. Transforming data means changing it into a format that is easy to use and analyze. Together, these steps improve data quality, leading to better insights and smarter decisions. Without clean and well-organized data, businesses might make poor choices based on faulty information, leading to missed opportunities. Investing in good data cleaning and transformation processes is crucial for any business that wants to use data to its advantage.
Cleaning and transforming data ensure accuracy and usability, enhancing decision-making. Reliable data leads to better insights and avoids missed opportunities in business.
Using Copilot for Data Factory to clean and transform data
Contoso Retailers, a fictitious retail company, is dedicated to applying data analysis techniques on their data to track sales trends and gain insights to implement new strategies based on those trends. They aim to make sure that they enhance the data quality by cleaning and transforming their data to prepare it for analysis, machine learning model training and visualization to gain insights on employee and sales performance over a period of time.
To achieve this, you can use Copilot for Data Factory to assist you in ingesting data, create new transformations on existing queries to add new columns for further analysis, clean data to remove empty values and create new queries with new synthetic data.
1 – Update existing queries to clean & transform data with Copilot
Before you can clean and transform data, you need to create a Dataflow Gen 2 that will assist you in ingesting your data. You can learn more about Dataflows and how to ingest data by referring to this training.
- Inside of your Dataflow Gen2, you can use the Copilot panel to understand an existing query to give you better idea of how to transform your data.
- The requirements for this scenario are to get a better analysis of the sales performance. As an initial step you can use Copilot to enable your team to get the revenue generated by the sales so they can build better models and visualizations.
Add a column ‘Gross Revenue’ that is a product of ‘UnitPrice’ and ‘OrderQty’, the result is rounded to two decimal places.

- For sales analysis, you can take it a step further by adding column to get the monetary value of the discount from each sale.
Example prompt: Add a column ‘Discount Value’ that is a product of ‘Gross Revenue’ and ‘UnitPriceDiscount’, the result is rounded to two decimal places.
Note: It’s important to transform your data based on your scenario and requirements.
- Within a sales scenario you can also add other columns like Net Revenue and Days to ship which can allows to make better business decisions for the impact the time it takes to ship a product has on the revenue generated.
2 – Create new queries and generate synthetic data
When using Copilot for Data Factory or any other Copilots, it is important to remember it is Copilot not Autopilot which means you still have control over how you transform your data. For this requirement, we manually create a new query to update it later.
- For this sales scenario, you need a dimension date table to help slice the data based on Fiscal Year, Quarter and Month. Copilot can assist you to generate synthetic dates based on the range given.
Example prompt: Add a step to the query to add date values to the ‘DateKey’ column starting from 1/1/2012 to 12/31/2013.
- To get a better performance from Copilot with the date values you can split the new synthetic date data into columns like day, month, month name and year.
- In this scenario, Contoso Retailers fiscal year starts on July of every year. Using Copilot, you can be able to identify the start of the Fiscal Year for better analysis when looking at sales performance.
Example prompt: Add 1 to ‘Year’ if the ‘Date’ month is greater than or equal to 7 and keep as is if the ‘Date’ month is less than or equal to 6. Store the result in a new column ‘Fiscal Year’ and take the last two digits of the result combine them with the prefix ‘FY’.
- Finally, you can enable your team to further slice the sales performance data by Fiscal Year and the quarter for each year. This will allow Contoso retailers to compare the performance of FY12 Q1 and FY13 Q1 to make better data driven decisions. Copilot can assist you in adding a Quarter column added.
Example prompt: Add a column ‘Quarter’ with values as ‘Q1’ when the ‘Month’ is equal to 7 or 8 or 9, ‘Q2’ when the ‘Month’ is equal to 10 or 11 or 12, ‘Q3’ when the ‘Month’ is equal to 1 or 2 or 3, ‘Q4’ when the ‘Month’ is equal to 4 or 5 or 6.

There’s more that you can do for a retail/sales scenario with Copilot for Data Factory, we provided a baseline of how you would approach data transformations and cleaning using Copilot. This can be applied to other scenarios; you would have to be mindful what transformations your scenario requires.
You can find a deep dive of using Copilot for Data Factory in our Copilot Learning Hub for Data professional’s tutorial.
Resources
Start your Copilot learning journey