Microsoft Fabric Updates Blog

Privacy by Design: PII Detection and Anonymization with PySpark on Microsoft Fabric

Introduction Whether you’re building analytics pipelines or conversational AI systems, the risk of exposing sensitive data is real. AI models trained on unfiltered datasets can inadvertently memorize and regurgitate PII, leading to compliance violations and reputational damage. This blog explores how to build scalable, secure, and compliant data workflows using PySpark, Microsoft Presidio, and Faker—covering … Continue reading “Privacy by Design: PII Detection and Anonymization with PySpark on Microsoft Fabric”

Playbook for metadata driven Lakehouse implementation in Microsoft Fabric

Co-Author – Gyani Sinha, Abhishek Narain Overview A well-architected lakehouse enables organizations to efficiently manage and process data for analytics, machine learning, and reporting. To achieve governance, scalability, operational excellence, and optimal performance, adopting a structured, metadata-driven approach is crucial for lakehouse implementation. Building on our previous blog, Demystifying Data Ingestion in Fabric, this post … Continue reading “Playbook for metadata driven Lakehouse implementation in Microsoft Fabric”

Demystifying data Ingestion in Fabric: fundamental components for ingesting Data into a Fabric Lakehouse using Fabric Data pipelines

Co-Author – Abhishek Narain Overview Building an effective Lakehouse starts with establishing a robust ingestion layer. Ingestion refers to the process of collecting, importing, and processing raw data from various sources into the data lake. Data ingestion is fundamental to the success of a data lake as it enables the consolidation, exploration, and processing of … Continue reading “Demystifying data Ingestion in Fabric: fundamental components for ingesting Data into a Fabric Lakehouse using Fabric Data pipelines”