Spark Application Lifecycle

The lifecycle of a Spark application encompasses seven distinct phases: initialization, planning, scheduling, execution, shuffle, aggregation, and finalization—understanding this lifecycle is critical for debugging, optimization, and predicting application behavior —from the moment you submit code to when results appear, a complex orchestration happens behind the scenes.

The Seven Phases of a Spark Application Lifecycle

Phase 1: Application Submission

What happens:You execute the spark-submit command:

spark – submit — master yarn

— num – executors 10

— executor – cores 4

— executor – memory 4g

my_spark_app .py

Behind the scenes:

Spark launcher starts on your machine

Reads your Python/Scala/Java code

Initializes SparkContext or SparkSession

Submits application to Cluster Manager

Timeline: 1-5 seconds

Key components involved: Spark launcher, Cluster Manager

Phase 2: Resource Negotiation

What happens:Cluster Manager allocates machines for your executors.

Driver : "I need 10 executors with 4 cores and 4GB each"

↓

Cluster Manager : "Checking availability…"

"Yes, I have 10 suitable machines"

"Launching executors…"

↓

Worker Nodes : "Executor JVMs starting…"

Behind the scenes:

Cluster Manager checks available resources

Allocates machines for Driver and Executors

Launches Driver JVM on one machine

Launches Executor JVMs on worker machines

Executors register with Driver

Timeline: 30-120 seconds (Kubernetes: 10-30s, YARN: 30-60s)

Key components involved: Cluster Manager, Driver, Executors

Phase 3: Planning and Optimization

What happens:Your code is analyzed and converted into an execution plan.

Your Code :

df = spark .read .parquet ( " data.parquet " )

result = df .filter ( df .age > 25 ).groupBy ( "city" ).count ()

result .write .parquet ( "output" )

Step 1 : Parse into Logical Plan

├─ Read parquet

├─ Filter age > 25

├─ GroupBy city

├─ Count

└─ Write parquet

Step 2 : Optimize with Catalyst

├─ Reorder operations

├─ Push filters down ( read only age > 25 records )

├─ Combine narrow transformations

└─ Choose best algorithms

Step 3 : Create Physical Plan

├─ Read 100 partitions in parallel

├─ Filter in map phase

├─ Shuffle for groupBy

├─ Aggregate to get count

└─ Write results

Behind the scenes:

Driver parses your code

Catalyst optimizer reorders operations

Chooses execution strategy (sort-based shuffle vs hash-based)

Determines memory requirements

Plans data locality

Timeline: 1-10 seconds (usually negligible)

Key components involved: Driver, Catalyst Optimizer

Phase 4: DAG Scheduling and Task Creation

What happens:Spark breaks the Physical Plan into stages, then stages into tasks.

Physical Plan

↓

DAG ( Directed Acyclic Graph )

├─ Stage 0 : Read + Filter ( no shuffle needed )

│ ├ ─ Task 0 on Executor 1 : Process partition 0

│ ├ ─ Task 1 on Executor 2 : Process partition 1

│ ├ ─ Task 2 on Executor 3 : Process partition 2

│ └ ─ … ( 100 tasks total , one per partition )

│

├─ [ SHUFFLE BARRIER ]

│ └ ─ Shuffle data across network

│

└─ Stage 1 : Aggregate ( post – shuffle )

├─ Task 100 on Executor 1 : Aggregate group 0

├─ Task 101 on Executor 2 : Aggregate group 1

└─ Task 102 on Executor 3 : Aggregate group 2

( 200 tasks total , one per output partition )

Behind the scenes:

DAGScheduler analyzes dependencies

Groups tasks that don't require data movement (stages)

Identifies shuffle boundaries (stage separators)

Creates task objects with code and data references

Builds task queue for executors

Timeline: 1-5 seconds

Key components involved: DAGScheduler , Driver

Phase 5: Task Execution

What happens:Tasks execute in parallel on executors.

Driver Task Queue :

[Task 0, Task 1, Task 2, …, Task 299]

Executor 1 : Processing [ Task 0 , Task 1 , Task 2 ]

Executor 2 : Processing [ Task 3 , Task 4 , Task 5 ]

Executor 3 : Processing [ Task 6 , Task 7 , Task 8 ]

Executor 10 : Processing [ Task 27 , Task 28 , Task 29 ]

( all executing in parallel )

Behind the scenes:

Driver assigns tasks to available executors

Each executor maintains a task queue

Executor deserializes task code

Loads partition from storage

Executes task on partition

Stores results (in memory or cache)

Timeline: Depends on data size and complexity (seconds to hours)

Key components involved: Driver, Executors, Storage System

Typical performance: 100MB/second per executor core (ranges from 10MB-1000MB/s)

Phase 6: Shuffle (If Needed )

What happens:If your operation requires data movement ( groupBy , join), shuffle occurs between stages.

After Stage 0 Execution :

Executor 1 : [ city= "NYC" : rows 1 – 1000 , city= "LA" : rows 1001 – 2000 ]

Executor 2 : [ city= "NYC" : rows 2001 – 3000 , city= "LA" : rows 3001 – 4000 ]

Executor 3 : [ city= "NYC" : rows 4001 – 5000 , city= "LA" : rows 5001 – 6000 ]

[SHUFFLE: Reorganize data]

↓

After Shuffle :

Executor 1 : [ city= "NYC" : all rows ] ( ready for aggregation )

Executor 2 : [ city= "LA" : all rows ] ( ready for aggregation )

Executor 3 : [ city= "SF" : all rows ] ( ready for aggregation )

Now Stage 1 can execute : Group by city is already grouped!

Behind the scenes:

Shuffle write: Executors write their data to intermediate files

Shuffle files are sorted by key

Network traffic: All executor machines send to all executor machines

Shuffle read: Next stage reads shuffled data

Executors load appropriate shuffle files

Timeline: Can be very long for large operations (seconds to hours)

Performance: Shuffle is typically the bottleneck. Usually 10-50MB/second per node.

Phase 7: Result Collection and Application Termination

What happens:Results are collected, written to storage, and the application shuts down.

Executors Complete Final Tasks

↓

Results stored (either in output storage or Driver memory)

↓

Driver reports success

↓

Cluster Manager receives "application complete" signal

↓

Cluster Manager kills Executor JVMs

↓

Machines freed and available for other applications

↓

Driver process terminates

Behind the scenes:

Last stage completes

Results written to output location (S3, HDFS, database)

Driver collects any remaining results

Application logs written

Cluster Manager notified

Resources released

Application terminates

Timeline: 1-30 seconds

Key components involved: Driver, Executors, Cluster Manager, Storage

Complete Timeline Example: A Real Spark Job

Let's trace a realistic 1-hour Spark job:

T + 0 :00 User runs : spark – submit — num – executors 50 my_job .py

T + 0 :0 5 Cluster Manager allocates 50 machines

Executor JVMs starting on worker nodes

T + 0 :30 Executors registered with Driver

T + 0 :31 Driver parses code

Catalyst optimizes plan

T + 0 :32 DAG created : 3 stages total

– Stage 0 : Read CSV + Filter ( 400 tasks )

– Stage 1 : Join with reference data ( 800 tasks )

– Stage 2 : Aggregate and write ( 200 tasks )

T + 0 :33 Stage 0 begins : Read + Filter

50 executors x 8 cores = 400 parallel tasks

Processing 100GB CSV file at 50MB/ sec per executor

Total throughput : 50 executors x 8 cores x 50MB/ s = 20GB/ s

100GB ÷ 20GB/ s = 5 seconds reading

T + 0 :40 Stage 0 complete ( 400 tasks finished )

Results ready for shuffle

T + 0 :42 Shuffle phase

50 executors reorganizing data by join key

~50GB of data moving across network

Network bandwidth : 1GB/ sec per executor

Total : 50 executors x 1GB/ s = 50GB/ s

50GB ÷ 50GB/ s = 1 second ( network bound )

But also writing / reading shuffle files :

File I / O can be slow , shuffle takes ~ 5 minutes total

T + 5 :47 Stage 1 begins : Join

Data already grouped by key from shuffle

50 executors joining in parallel

Shuffle output : 50GB

Processing at 100MB/ sec per executor

50 executors x 100MB/ sec = 5GB/ s

50GB ÷ 5GB/ s = 10 seconds

T + 5 :58 Stage 1 complete

T + 6 :00 Stage 2 begins : Aggregate + Write

Aggregating joined data

Writing results to S3

Output : 10GB

Write speed : 50MB/ sec per executor

50 executors x 50MB/ sec = 2.5GB/ s

10GB ÷ 2.5GB/ s = 4 seconds

T + 6 :0 5 All stages complete

T + 6 :10 Executors killed by Cluster Manager

Resources released

Total job time : 6 minutes 10 seconds

Monitoring the Lifecycle: Spark UI

Spark provides a Web UI showing the lifecycle in real-time:

http :// driver – hostname:4040

Tabs Available :

├─ Jobs: Shows each job ( action that triggered execution )

├─ Stages: Shows each stage and task performance

├─ Storage: Shows cached data in executors

├─ Environment: Shows configuration

└─ Executors: Shows executor health and metrics

Each tab shows:

Execution progress (%)

Task completion times

Shuffle read/write volumes

GC time on executors

Memory usage

Common Bottlenecks by Phase

Phase	Bottleneck	Symptom	Fix
Resource Negotiation	Cluster busy	Long wait before tasks start	Reduce executor count or wait
Planning	Complex query	Takes minutes to optimize	Simplify query structure
Task Execution	Slow I/O	Tasks take long time	Increase parallelism or tune storage
Shuffle	Network bandwidth	Shuffle takes hours	Reduce shuffle volume or partition better
Result Collection	Driver memory	Driver out of memory	Avoid collect( ), write to storage

💡 Did You Know?

• Cold start: First Spark job on a cluster is 5-10x slower (JVM warm-up, JIT compilation)

• Shuffle is typically 80% of job time: Optimizing shuffle is the most impactful optimization

• Spark UI is the best debugging tool: 90% of issues visible in Spark UI

• Dynamic allocation: Executors can be added/removed during job execution

• Speculative execution: Spark can re-run slow tasks on other executors to speed up stragglers

📚 Study Notes

• 7 phases: Submission → Resource Negotiation → Planning → DAG → Execution → Shuffle → Termination

• Submission: spark-submit command launches application

• Resource negotiation: Cluster Manager allocates machines

• Planning: Catalyst optimizes logical plan to physical plan

• DAG creation: Physical plan broken into stages and tasks

• Execution: Tasks run in parallel on executors

• Shuffle: Data reorganized between stages (slow phase)

• Termination: Results written, executors killed, resources freed

• Typical job time distribution: 5% planning, 60% execution, 30% shuffle, 5% overhead

• Bottlenecks: Shuffle, network, I/O, or task execution

• Monitoring: Spark UI shows real-time progress for each phase

Navigation

Apache Spark

Leave a Reply Cancel reply