Microsoft Fabric Updates Blog

Announcing the Fabric Apache Spark Diagnostic Emitter: Collect Logs and Metrics

Fabric Apache Spark Diagnostic Emitter for Logs and Metrics is now in public preview. This new feature allows Apache Spark users to collect Spark logs, job events, and metrics from their Spark applications and send them to various destinations, including Azure Event Hubs, Azure Storage, and Azure Log Analytics. It provides robust support for monitoring and troubleshooting Spark applications, enhancing your visibility into application performance.

What Does the Diagnostic Emitter Do?

The Fabric Apache Spark Diagnostic Emitter enables Apache Spark applications to emit critical logs and metrics that can be used for real-time monitoring, analysis, and troubleshooting. Whether you’re sending logs to Azure Event Hubs, Azure Storage, or Azure Log Analytics, this emitter simplifies the process, allowing you to collect data seamlessly and store it in your preferred destinations.

Key Benefits of the Apache Spark Diagnostic Emitter

  • Centralized Monitoring: Send logs and metrics to Azure Event Hubs, Azure Log Analytics, or Azure Storage for real-time data streaming, deep analysis and querying, as well as long-term retention.
  • Flexible Configuration: Easily configure Spark to emit logs and metrics to one or more destinations, with support for connection strings, Azure Key Vault integration, and more.
  • Comprehensive Metrics: Collect a wide range of logs and metrics, including driver and executor logs, event logs, and detailed Spark application metrics.

Below is a quick step-by-step guide for the one-time configuration of the destination for collecting logs and metrics.

Step 1: Create your Azure resources as destination

To begin, you’ll need an Azure Event Hubs instance, an Azure Log Analytics workspace, or an Azure Blob Storage account, based on your preference. If you don’t already have one, you can quickly create one via the Azure portal.

Step 2: Configure Your Fabric Environment Artifact for Apache Spark

Next, you’ll need to create a Fabric Environment Artifact in Microsoft Fabric and configure it with the required Spark properties.

Here are some example key configuration properties available for the diagnostic emitter:

  • spark.synapse.diagnostic.emitters: Comma-separated names of diagnostic emitters.
  • spark.synapse.diagnostic.emitter.<destination>.type: The destination type (e.g., AzureEventHub).
  • spark.synapse.diagnostic.emitter.<destination>.categories: The log categories to be collected (e.g., DriverLog, ExecutorLog, EventLog, Metrics).
  • spark.synapse.diagnostic.emitter.<destination>.secret: The Azure Event Hubs connection string.
  • spark.synapse.diagnostic.emitter.<destination>.secret.keyVault: Azure Key Vault name for storing the connection string.

For a full list of configuration options, refer to the official documentation below.

Step 3: Attach the Environment Artifact

Once configured, attach your environment artifact to a Notebook or Spark Job Definition.

  • For Notebooks or Spark jobs: Navigate to the specific notebook or Spark job definition and attach the environment with the configured Spark properties.
  • To set the environment as the default for the workspace: Go to your Workspace Settings in Microsoft Fabric, find the Spark settings, and select the configured environment.

After this configuration, you can run your Notebooks or Spark jobs as you normally do. You can now efficiently collect and analyze logs and metrics from your Apache Spark applications using your preferred destination. This feature simplifies monitoring and debugging, allowing you to focus on your core business logic. Additionally, you can query, aggregate, and create custom alerts in Azure Monitor by querying logs and metrics at regular intervals, with alerts triggered based on your defined criteria.

Log Data Sample

Here is a sample log record in JSON format, showing how Spark logs and metrics are captured:

jsonCopy code{
  "timestamp": "2024-09-06T03:09:37.235Z",
  "category": "Log|EventLog|Metrics",
  "fabricLivyId": "<fabric-livy-id>",
  "applicationId": "<application-id>",
  "applicationName": "<application-name>",
  "executorId": "<driver-or-executor-id>",
  "properties": {
    "message": "Initialized BlockManager: BlockManagerId(1, vm-04b22223, 34319, None)",
    "logger_name": "org.apache.spark.storage.BlockManager",
    "level": "INFO"
  }
}

Stay tuned for more updates, and happy coding!

Related documents:

โพสต์ในบล็อกที่เกี่ยวข้อง

Announcing the Fabric Apache Spark Diagnostic Emitter: Collect Logs and Metrics

กันยายน 26, 2567 โดย Ye Xu

Fast Copy in Dataflow Gen2 is now General Available! This powerful feature enables rapid and efficient ingestion of large data volumes, leveraging the same robust backend as the Copy Activity in Data pipelines. With Fast Copy, you can experience significantly shorter data processing times and improved cost efficiency for your Dataflow Gen2. Additionally, it boosts … Continue reading “Announcing the General Availability of Fast Copy in Dataflows Gen2”

กันยายน 26, 2567 โดย Guy Reginiano

Now you can set up Data Activator alerts directly on your KQL queries in KQL querysets.