Microsoft Fabric Updates Blog

Gain Deeper Insights into Spark Jobs with JobInsight in Microsoft Fabric

JobInsight is a powerful Java-based diagnostic library designed to help developers and data engineers analyze completed Spark applications in Microsoft Fabric. With JobInsight, you can programmatically access Spark execution metrics and logs—all from within a Fabric Notebook.

Whether you’re investigating performance bottlenecks, debugging task execution, or conducting post-run analysis across jobs, stages, or executors, JobInsight surfaces valuable insights quickly and efficiently.

What is JobInsight?

JobInsight provides two core capabilities:

  • Interactive Spark Job Analysis
    Offers structured APIs that return execution data—such as Spark queries, jobs, stages, tasks, and executors—as Spark Datasets for deep-dive analysis.
  • Spark Event Log Access
    Allows users to copy Spark event logs to a OneLake or ADLS Gen2 directory for long-term storage or custom offline diagnostics.

With these tools, you can analyze, debug, and monitor Spark applications—all within your Fabric workspace.

Key Features and How to Use Them

Analyze Completed Spark Applications

You can extract critical execution data with just a few lines of code:

import com.microsoft.jobinsight.diagnostic.SparkDiagnostic

val jobInsight = SparkDiagnostic.analyze(
  workspaceId,
  artifactId,
  livyId,
  jobType,         // e.g., "sessions" or "batches"
  stateStorePath,  // Output path to store analysis results
  attemptId        // Optional; defaults to 1
)

val queries = jobInsight.queries
val jobs = jobInsight.jobs
val stages = jobInsight.stages
val tasks = jobInsight.tasks
val executors = jobInsight.executors

You can then apply standard Spark operations to explore trends, detect anomalies, and optimize performance.

Reuse Past Analyses

No need to rerun diagnostics on the same job—just reload previously saved results:

val jobInsight = SparkDiagnostic.loadJobInsight(stateStorePath)

val queries = jobInsight.queries
val jobs = jobInsight.jobs
// and so on...

This makes historical analysis and iterative debugging easy and efficient.

Save Metrics and Logs to a Lakehouse

You can persist analysis outputs into Lakehouse tables for reporting or integration:

val df = jobInsight.queries

df.write
  .format("delta")
  .mode("overwrite")
  .saveAsTable("sparkdiagnostic_lh.Queries")

Apply the same logic for other datasets such as jobs, stages, or executors.

Copy Event Logs to Lakehouse or ADLS Gen2

You can also copy event logs for deeper inspection or long-term retention:

import com.microsoft.jobinsight.diagnostic.LogUtils

val contentLength = LogUtils.copyEventLog(
  workspaceId,
  artifactId,
  livyId,
  jobType,
  targetDirectory,
  asyncMode = true, // Use async mode for best performance
  attemptId = 1
)

Example

val lakehouseBaseDir = "abfss://<workspace>@<onelake>/Files/eventlog/0513"
val jobType = "sessions"

copyEventLogs(
  workspaceId,
  artifactId,
  livyId,
  jobType,
  attemptId = 1,
  asyncMode = true,
  s"$lakehouseBaseDir/$jobType/async"
)

copyEventLogs(
  workspaceId,
  artifactId,
  livyId,
  jobType,
  attemptId = 1,
  asyncMode = false,
  s"$lakehouseBaseDir/$jobType/sync"
)

Use this method when you want to store raw logs outside the UI for further analysis.

Get Started

JobInsight makes Spark diagnostics in Microsoft Fabric easier, faster, and more powerful. By integrating with familiar Spark APIs and offering deep access to execution logs and metrics, it empowers you to:

  • Visualize execution breakdowns.
  • Monitor and tune resource usage.
  • Identify and troubleshoot performance bottlenecks.
  • Reuse and automate reproducible diagnostics.

For full documentation, check out: Job insight diagnostics library (Preview).

Liittyvät blogikirjoitukset

Gain Deeper Insights into Spark Jobs with JobInsight in Microsoft Fabric

helmikuuta 3, 2026 tekijä Arun Ulagaratchagan

Data teams today are under extraordinary pressure. Expectations around analytics and AI have never been higher, yet enterprise data continues to live across a patchwork of systems, tools, and platforms. The result is friction, duplication, and complexity, making it harder for data teams to provide a unified, real-time view of their business. Microsoft and Snowflake … Continue reading “Microsoft OneLake and Snowflake interoperability (Generally Available)”

tammikuuta 29, 2026 tekijä Bodhisatva Gautam

We announced Outbound Access Protection for Spark (Generally Available) and recently extended it to support SQL Endpoint and Warehouse. Now, Pipelines, Copy job, Dataflows, OneLake Shortcuts as well as Mirrored Databases (such as Mirrored SQL Database, Mirrored Snowflake) support Workspace level Outbound Access Protection (Preview). Key Benefits What to expect with Outbound access protection (OAP) … Continue reading “Workspace Outbound Access Protection for Data Factory and OneLake Shortcuts (Preview)”