You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michel Hubert <mi...@vsnsystemen.nl> on 2015/07/09 09:21:29 UTC
spark streaming performance
Hi,
I've developed a POC Spark Streaming application.
But it seems to perform better on my development machine than on our cluster.
I submit it to yarn on our cloudera cluster.
But my first question is more detailed:
In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.
For example:
1678
map at LogonAnalysis.scala:215+details
2015/07/09 09:17:00
5 s
50/50
173.5 KB
But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?
Details for Stage 1678
* Total task time across all tasks: 0.8 s
* Shuffle write: 173.5 KB / 2031
Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
14 ms
14 ms
15 ms
15 ms
24 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
2.6 KB / 28
3.1 KB / 35
3.5 KB / 42
3.9 KB / 46
4.4 KB / 53
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
xxxx:44231
1.0 s
50
0
50
173.5 KB / 2031
RE: spark streaming performance
Posted by Michel Hubert <mi...@vsnsystemen.nl>.
What do you mean with “full task list of all the 50 tasks”?
From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 10:28
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance
Can you provide the full task list of all the 50 tasks? I am curious to find out whats going on.
And spark.executor.cores needs to be set. Or --executor-cores in spark-submit
On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
Thanks for you reply!
@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?
@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?
I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id
Description
Submitted
Duration
Tasks: Succeeded/Total
Input
Output
Shuffle Read
Shuffle Write
706
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details
2015/07/09 09:56:24
14 ms
1/1
704
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details
2015/07/09 09:56:24
74 ms
1/1
1377.0 B
703
count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details
2015/07/09 09:56:20
5 s
51/51
1377.0 B
Details for Stage 703
• Total task time across all tasks: 0.6 s
• Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
11 ms
12 ms
12 ms
13 ms
23 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
1
xxx81440xxx:51081
0.8 s
51
0
51
1377.0 B / 51
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
50
9308
0
SUCCESS
PROCESS_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:20
11 ms
27.0 B / 1
0
9309
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
1
9310
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
2
9311
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
3
9312
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
4
9313
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
5
9314
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
11 ms
27.0 B / 1
6
9315
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
7
9316
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
8
9317
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
9
9318
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
10
9319
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
11
9320
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
12
9321
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
13
9322
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
14
9323
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
15
9324
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
16
9325
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
17
9326
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
18
9327
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
19
9328
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
20
9329
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
21
9330
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
22
9331
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
23
9332
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
24
9333
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
25
9334
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
26
9335
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
27
9336
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
28
9337
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
29
9338
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
30
9339
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
31
9340
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
32
9341
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
33
9342
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
34
9343
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
35
9344
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
36
9345
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
37
9346
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
38
9347
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
39
9348
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
23 ms
27.0 B / 1
40
9349
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
41
9350
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
42
9351
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
43
9352
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
44
9353
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
45
9354
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
46
9355
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
47
9356
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
48
9357
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
49
9358
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
Streaming
• Started at: Thu Jul 09 09:27:01 CEST 2015
• Time since start: 24 minutes
• Network receivers: 1
• Batch interval: 10 seconds
• Processed batches: 143
• Waiting batches: 1
• Received records: 295016
• Processed records: 292937
Statistics over last 100 processed batches
Receiver Statistics
• Receiver
• Status
• Location
• Records in last batch
• [2015/07/09 09:51:01]
• Minimum rate
• [records/sec]
• Median rate
• [records/sec]
• Maximum rate
• [records/sec]
• Last Error
SocketReceiver-0
ACTIVE
xxx81441xx
2.1 K
196
205
218
-
Batch Processing Statistics
Metric
Last batch
Minimum
25th percentile
Median
75th percentile
Maximum
Processing Time
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 736 ms
4 seconds 775 ms
4 seconds 926 ms
Scheduling Delay
0 ms
0 ms
0 ms
0 ms
0 ms
1 ms
Total Delay
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 737 ms
4 seconds 775 ms
4 seconds 926 ms
Executors (3)
• Memory: 0.0 B Used (1326.0 MB Total)
• Disk: 0.0 B Used
Executor ID
Address
RDD Blocks
Memory Used
Disk Used
Active Tasks
Failed Tasks
Complete Tasks
Total Tasks
Task Time
Input
Shuffle Read
Shuffle Write
Logs
Thread Dump
1
xxx81440xxx:51081
0
0.0 B / 530.3 MB
0.0 B
0
0
8159
8159
3.0 m
0.0 B
392.0 B
206.8 KB
stdout
stderr
Thread Dump
2
xxx81441xxx:34114
0
0.0 B / 530.3 MB
0.0 B
1
0
35
36
5.1 s
0.0 B
364.0 B
700.0 B
stdout
stderr
Thread Dump
<driver>
xxx81436xxx:38325
0
0.0 B / 265.4 MB
0.0 B
0
0
0
0
0 ms
0.0 B
0.0 B
0.0 B
Thread Dump
From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance
What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine than on our cluster.
I submit it to yarn on our cloudera cluster.
But my first question is more detailed:
In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.
For example:
1678
map at LogonAnalysis.scala:215+details
2015/07/09 09:17:00
5 s
50/50
173.5 KB
But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?
Details for Stage 1678
• Total task time across all tasks: 0.8 s
• Shuffle write: 173.5 KB / 2031
Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
14 ms
14 ms
15 ms
15 ms
24 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
2.6 KB / 28
3.1 KB / 35
3.5 KB / 42
3.9 KB / 46
4.4 KB / 53
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
xxxx:44231
1.0 s
50
0
50
173.5 KB / 2031
Re: spark streaming performance
Posted by Tathagata Das <td...@databricks.com>.
I am not sure why you are getting node_local and not process_local. Also
there is probably not a good documentation other than that configuration
page - http://spark.apache.org/docs/latest/configuration.html (search for
locality)
On Thu, Jul 9, 2015 at 5:51 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:
>
>
>
>
>
>
> StorageLevel.MEMORY_ONLY
>
>
>
> Streaming / Batch Processing Statistics / Processing Time:
>
> Executor-cores NOT specified: 5s
>
> Executor-cores 8: 400 ms
>
>
>
> Processing time has significantly reduced with executor-cores set to 8.
>
> But what is the general rule of thumb for a good excecutor-cores settings?
>
>
>
> But 257 records/sec still isn’t very, or not?
>
> Statistics over last 95 processed batches Receiver Statistics
>
> · *Receiver*
>
> · *Status*
>
> · *Location*
>
> · *Records in last batch*
>
> · *[2015/07/09 14:49:47]*
>
> · *Minimum rate*
>
> · *[records/sec]*
>
> · *Median rate*
>
> · *[records/sec]*
>
> · *Maximum rate*
>
> · *[records/sec]*
>
> · *Last Error*
>
> SocketReceiver-0
>
> ACTIVE
>
> bfravicsvr81440-cld.opentsp.com
>
> 1352
>
> 0
>
> 257
>
> 279
>
> -
>
>
>
>
>
>
>
> “After that the somehow the system is not able to launch any process
> local task”
>
> Where can I look for an answer why some tasks are NODE_LOCAL and others
> are PROCESS_LOCAL?
>
> Where should I look for the reason the locality wait expires?
>
>
> Details for Stage 147
>
> · *Total task time across all tasks:* 0.4 s
>
> · *Shuffle write:* 704.0 B / 26
>
> *Show additional metrics*
> Summary Metrics for 26 Completed Tasks
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 9 ms
>
> 11 ms
>
> 12 ms
>
> 24 ms
>
> 30 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 28.0 B / 1
> Aggregated Metrics by Executor
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 1
>
> xxx81440xxx:38882
>
> 0.8 s
>
> 25
>
> 0
>
> 25
>
> 677.0 B / 25
>
> 2
>
> xxx81441xxx:46832
>
> 45 ms
>
> 1
>
> 0
>
> 1
>
> 27.0 B / 1
> Tasks
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 0
>
> 1029
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 30 ms
>
> 27.0 B / 1
>
> 2
>
> 1031
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 29 ms
>
> 27.0 B / 1
>
> 1
>
> 1030
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 30 ms
>
> 27.0 B / 1
>
> 5
>
> 1034
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 25 ms
>
> 27.0 B / 1
>
> 4
>
> 1033
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 28 ms
>
> 27.0 B / 1
>
> 3
>
> 1032
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 27 ms
>
> 27.0 B / 1
>
> 25
>
> 1036
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 2 / xxx81441xxx
>
> 2015/07/09 14:43:55
>
> 23 ms
>
> 27.0 B / 1
>
> 6
>
> 1035
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 24 ms
>
> 27.0 B / 1
>
> 7
>
> 1037
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 10 ms
>
> 27.0 B / 1
>
> 8
>
> 1038
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 12 ms
>
> 27.0 B / 1
>
> 9
>
> 1039
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 11 ms
>
> 27.0 B / 1
>
> 10
>
> 1040
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 11 ms
>
> 27.0 B / 1
>
> 11
>
> 1041
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 11 ms
>
> 27.0 B / 1
>
> 12
>
> 1042
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
> ...
>
> [Message clipped]
RE: spark streaming performance
Posted by Michel Hubert <mi...@vsnsystemen.nl>.
StorageLevel.MEMORY_ONLY
Streaming / Batch Processing Statistics / Processing Time:
Executor-cores NOT specified: 5s
Executor-cores 8: 400 ms
Processing time has significantly reduced with executor-cores set to 8.
But what is the general rule of thumb for a good excecutor-cores settings?
But 257 records/sec still isn’t very, or not?
Statistics over last 95 processed batches
Receiver Statistics
· Receiver
· Status
· Location
· Records in last batch
· [2015/07/09 14:49:47]
· Minimum rate
· [records/sec]
· Median rate
· [records/sec]
· Maximum rate
· [records/sec]
· Last Error
SocketReceiver-0
ACTIVE
bfravicsvr81440-cld.opentsp.com
1352
0
257
279
-
“After that the somehow the system is not able to launch any process local task”
Where can I look for an answer why some tasks are NODE_LOCAL and others are PROCESS_LOCAL?
Where should I look for the reason the locality wait expires?
Details for Stage 147
· Total task time across all tasks: 0.4 s
· Shuffle write: 704.0 B / 26
Show additional metrics
Summary Metrics for 26 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
9 ms
11 ms
12 ms
24 ms
30 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
28.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
1
xxx81440xxx:38882
0.8 s
25
0
25
677.0 B / 25
2
xxx81441xxx:46832
45 ms
1
0
1
27.0 B / 1
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
0
1029
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
30 ms
27.0 B / 1
2
1031
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
29 ms
27.0 B / 1
1
1030
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
30 ms
27.0 B / 1
5
1034
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
25 ms
27.0 B / 1
4
1033
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
28 ms
27.0 B / 1
3
1032
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
27 ms
27.0 B / 1
25
1036
0
SUCCESS
PROCESS_LOCAL
2 / xxx81441xxx
2015/07/09 14:43:55
23 ms
27.0 B / 1
6
1035
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
24 ms
27.0 B / 1
7
1037
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
10 ms
27.0 B / 1
8
1038
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
12 ms
27.0 B / 1
9
1039
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
11 ms
27.0 B / 1
10
1040
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
11 ms
27.0 B / 1
11
1041
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
11 ms
27.0 B / 1
12
1042
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
12 ms
27.0 B / 1
13
1043
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
15 ms
27.0 B / 1
14
1044
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
9 ms
27.0 B / 1
15
1045
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
12 ms
28.0 B / 1
16
1046
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
10 ms
27.0 B / 1
17
1047
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
11 ms
27.0 B / 1
18
1048
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
11 ms
27.0 B / 1
19
1049
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
11 ms
27.0 B / 1
20
1050
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
12 ms
27.0 B / 1
21
1051
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
16 ms
28.0 B / 1
22
1052
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
12 ms
27.0 B / 1
23
1053
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
10 ms
27.0 B / 1
24
1054
0
SUCCESS
NODE_LOCAL
1 / xxx81440xxx
2015/07/09 14:43:55
12 ms
27.0 B / 1
From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 12:46
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance
I think I know what is going on. What is the storage level that you are using for your receivers? Are you using replication in the storage level?
Whats going on here is the following. There is 1 task with process_level locality that is immediately launched as soon as the stage is submitted. After that the somehow the system is not able to launch any process local tasks, so has to wait for the spark.locality.wait = 3 seconds, to allow the lower levels of locality, which is rack local. Then it launches rest of 49 tasks, which finish within 1 second in total. Overall, its taking 3 second wait + 1 second + something = approx 5 seconds.
Increasing the number of cores per container will definitely increase performance.
TD
On Thu, Jul 9, 2015 at 2:09 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
What do you mean with “full task list of all the 50 tasks”?
This?
Details for Stage 271
• Total task time across all tasks: 0.7 s
• Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
13 ms
13 ms
14 ms
14 ms
15 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
xxx81441xxx:47790<http://bfravicsvr81441-cld.opentsp.com:47790>
0.9 s
51
0
51
1377.0 B / 51
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
50
3549
0
SUCCESS
PROCESS_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:10
13 ms
27.0 B / 1
0
3550
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
15 ms
27.0 B / 1
1
3551
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
14 ms
27.0 B / 1
2
3552
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
14 ms
27.0 B / 1
3
3553
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
14 ms
27.0 B / 1
4
3554
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
13 ms
27.0 B / 1
5
3555
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
13 ms
27.0 B / 1
6
3556
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
14 ms
27.0 B / 1
7
3557
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:13
15 ms
27.0 B / 1
8
3558
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
9
3559
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
10
3560
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
11
3561
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
12
3562
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
13
3563
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
15 ms
27.0 B / 1
14
3564
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
15
3565
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
16
3566
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
17
3567
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
18
3568
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
19
3569
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
20
3570
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
21
3571
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
22
3572
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
23
3573
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
24
3574
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
25
3575
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
26
3576
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
27
3577
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
28
3578
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
15 ms
27.0 B / 1
29
3579
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
30
3580
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
31
3581
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
32
3582
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
33
3583
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
15 ms
27.0 B / 1
34
3584
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
35
3585
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
36
3586
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
37
3587
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
38
3588
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
39
3589
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
40
3590
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
41
3591
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
42
3592
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
13 ms
27.0 B / 1
43
3593
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
15 ms
27.0 B / 1
44
3594
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
45
3595
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
46
3596
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
47
3597
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
48
3598
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
14 ms
27.0 B / 1
49
3599
0
SUCCESS
RACK_LOCAL
2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>
2015/07/09 11:09:14
15 ms
27.0 B / 1
From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 10:28
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance
Can you provide the full task list of all the 50 tasks? I am curious to find out whats going on.
And spark.executor.cores needs to be set. Or --executor-cores in spark-submit
On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
Thanks for you reply!
@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?
@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?
I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id
Description
Submitted
Duration
Tasks: Succeeded/Total
Input
Output
Shuffle Read
Shuffle Write
706
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details
2015/07/09 09:56:24
14 ms
1/1
704
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details
2015/07/09 09:56:24
74 ms
1/1
1377.0 B
703
count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details
2015/07/09 09:56:20
5 s
51/51
1377.0 B
Details for Stage 703
• Total task time across all tasks: 0.6 s
• Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
11 ms
12 ms
12 ms
13 ms
23 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
1
xxx81440xxx:51081
0.8 s
51
0
51
1377.0 B / 51
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
50
9308
0
SUCCESS
PROCESS_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:20
11 ms
27.0 B / 1
0
9309
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
1
9310
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
2
9311
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
3
9312
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
4
9313
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
5
9314
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
11 ms
27.0 B / 1
6
9315
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
7
9316
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
8
9317
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
9
9318
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
10
9319
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
11
9320
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
12
9321
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
13
9322
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
14
9323
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
15
9324
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
16
9325
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
17
9326
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
18
9327
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
19
9328
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
20
9329
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
21
9330
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
22
9331
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
23
9332
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
24
9333
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
25
9334
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
26
9335
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
27
9336
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
28
9337
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
29
9338
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
30
9339
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
31
9340
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
32
9341
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
33
9342
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
34
9343
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
35
9344
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
36
9345
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
37
9346
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
38
9347
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
39
9348
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
23 ms
27.0 B / 1
40
9349
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
41
9350
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
42
9351
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
43
9352
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
44
9353
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
45
9354
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
46
9355
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
47
9356
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
48
9357
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
49
9358
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
Streaming
• Started at: Thu Jul 09 09:27:01 CEST 2015
• Time since start: 24 minutes
• Network receivers: 1
• Batch interval: 10 seconds
• Processed batches: 143
• Waiting batches: 1
• Received records: 295016
• Processed records: 292937
Statistics over last 100 processed batches
Receiver Statistics
• Receiver
• Status
• Location
• Records in last batch
• [2015/07/09 09:51:01]
• Minimum rate
• [records/sec]
• Median rate
• [records/sec]
• Maximum rate
• [records/sec]
• Last Error
SocketReceiver-0
ACTIVE
xxx81441xx
2.1 K
196
205
218
-
Batch Processing Statistics
Metric
Last batch
Minimum
25th percentile
Median
75th percentile
Maximum
Processing Time
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 736 ms
4 seconds 775 ms
4 seconds 926 ms
Scheduling Delay
0 ms
0 ms
0 ms
0 ms
0 ms
1 ms
Total Delay
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 737 ms
4 seconds 775 ms
4 seconds 926 ms
Executors (3)
• Memory: 0.0 B Used (1326.0 MB Total)
• Disk: 0.0 B Used
Executor ID
Address
RDD Blocks
Memory Used
Disk Used
Active Tasks
Failed Tasks
Complete Tasks
Total Tasks
Task Time
Input
Shuffle Read
Shuffle Write
Logs
Thread Dump
1
xxx81440xxx:51081
0
0.0 B / 530.3 MB
0.0 B
0
0
8159
8159
3.0 m
0.0 B
392.0 B
206.8 KB
stdout
stderr
Thread Dump
2
xxx81441xxx:34114
0
0.0 B / 530.3 MB
0.0 B
1
0
35
36
5.1 s
0.0 B
364.0 B
700.0 B
stdout
stderr
Thread Dump
<driver>
xxx81436xxx:38325
0
0.0 B / 265.4 MB
0.0 B
0
0
0
0
0 ms
0.0 B
0.0 B
0.0 B
Thread Dump
From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance
What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine than on our cluster.
I submit it to yarn on our cloudera cluster.
But my first question is more detailed:
In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.
For example:
1678
map at LogonAnalysis.scala:215+details
2015/07/09 09:17:00
5 s
50/50
173.5 KB
But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?
Details for Stage 1678
• Total task time across all tasks: 0.8 s
• Shuffle write: 173.5 KB / 2031
Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
14 ms
14 ms
15 ms
15 ms
24 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
2.6 KB / 28
3.1 KB / 35
3.5 KB / 42
3.9 KB / 46
4.4 KB / 53
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
xxxx:44231
1.0 s
50
0
50
173.5 KB / 2031
Re: spark streaming performance
Posted by Tathagata Das <td...@databricks.com>.
I think I know what is going on. What is the storage level that you are
using for your receivers? Are you using replication in the storage level?
Whats going on here is the following. There is 1 task with process_level
locality that is immediately launched as soon as the stage is submitted.
After that the somehow the system is not able to launch any process local
tasks, so has to wait for the spark.locality.wait = 3 seconds, to allow the
lower levels of locality, which is rack local. Then it launches rest of 49
tasks, which finish within 1 second in total. Overall, its taking 3 second
wait + 1 second + something = approx 5 seconds.
Increasing the number of cores per container will definitely increase
performance.
TD
On Thu, Jul 9, 2015 at 2:09 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:
> What do you mean with “full task list of all the 50 tasks”?
>
>
>
> This?
> Details for Stage 271
>
> · *Total task time across all tasks:* 0.7 s
>
> · *Shuffle write:* 1377.0 B / 51
>
> *Show additional metrics*
> Summary Metrics for 51 Completed Tasks
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 13 ms
>
> 13 ms
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
> Aggregated Metrics by Executor
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> bfravicsvr81441-cld.opentsp.com:47790
>
> 0.9 s
>
> 51
>
> 0
>
> 51
>
> 1377.0 B / 51
> Tasks
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 50
>
> 3549
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:10
>
> 13 ms
>
> 27.0 B / 1
>
> 0
>
> 3550
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 15 ms
>
> 27.0 B / 1
>
> 1
>
> 3551
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 2
>
> 3552
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 3
>
> 3553
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 4
>
> 3554
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 13 ms
>
> 27.0 B / 1
>
> 5
>
> 3555
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 13 ms
>
> 27.0 B / 1
>
> 6
>
> 3556
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 7
>
> 3557
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 15 ms
>
> 27.0 B / 1
>
> 8
>
> 3558
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 9
>
> 3559
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 10
>
> 3560
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 11
>
> 3561
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 12
>
> 3562
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 13
>
> 3563
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 14
>
> 3564
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 15
>
> 3565
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 16
>
> 3566
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 17
>
> 3567
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 18
>
> 3568
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 19
>
> 3569
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 20
>
> 3570
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 21
>
> 3571
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 22
>
> 3572
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 23
>
> 3573
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 24
>
> 3574
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 25
>
> 3575
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 26
>
> 3576
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 27
>
> 3577
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 28
>
> 3578
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 29
>
> 3579
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 30
>
> 3580
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 31
>
> 3581
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 32
>
> 3582
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 33
>
> 3583
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 34
>
> 3584
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 35
>
> 3585
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 36
>
> 3586
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 37
>
> 3587
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 38
>
> 3588
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 39
>
> 3589
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 40
>
> 3590
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 41
>
> 3591
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 42
>
> 3592
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 43
>
> 3593
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 44
>
> 3594
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 45
>
> 3595
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 46
>
> 3596
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 47
>
> 3597
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 48
>
> 3598
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 49
>
> 3599
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
>
>
>
>
> *From:* Tathagata Das [mailto:tdas@databricks.com]
> *Sent:* donderdag 9 juli 2015 10:28
> *To:* Michel Hubert
> *Cc:* user@spark.apache.org
> *Subject:* Re: spark streaming performance
>
>
>
> Can you provide the full task list of all the 50 tasks? I am curious to
> find out whats going on.
>
>
>
> And spark.executor.cores needs to be set. Or --executor-cores in
> spark-submit
>
>
>
>
>
> On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>
> wrote:
>
>
>
>
>
> Hi,
>
>
>
> Thanks for you reply!
>
>
>
>
>
> *@What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second. *
>
>
>
> What metric/page shows the number of cores in the executor? Should I
> specify “spark.executor.cores” when submitting the application?
>
>
>
>
>
>
>
> @*Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?*
>
>
>
> Stage 704 seems to be submitted 4 sec later. What could be the reason
> other than stage 703 finished in 4 to 5 sec?
>
>
>
>
>
> I’ve copy pasted some more detailed info (masked out some servernames
> xxxNUMBERxxx).
> Completed Stages (530)
>
> *Stage Id*
>
> *Description*
>
> *Submitted*
>
> *Duration*
>
> *Tasks: Succeeded/Total*
>
> *Input*
>
> *Output*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> 706
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 14 ms
>
> 1/1
>
> 704
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 74 ms
>
> 1/1
>
> 1377.0 B
>
> 703
>
> count at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>
> +details
>
> 2015/07/09 09:56:20
>
> 5 s
>
> 51/51
>
> 1377.0 B
>
>
>
>
>
>
>
> *Details for Stage 703*
>
> · *Total task time across all tasks: *0.6 s
>
> · *Shuffle write: *1377.0 B / 51
>
> *Show additional metrics*
>
> *Summary Metrics for 51 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 11 ms
>
> 12 ms
>
> 12 ms
>
> 13 ms
>
> 23 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 1
>
> xxx81440xxx:51081
>
> 0.8 s
>
> 51
>
> 0
>
> 51
>
> 1377.0 B / 51
>
> *Tasks*
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 50
>
> 9308
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:20
>
> 11 ms
>
> 27.0 B / 1
>
> 0
>
> 9309
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 1
>
> 9310
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 2
>
> 9311
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 3
>
> 9312
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 4
>
> 9313
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 5
>
> 9314
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 11 ms
>
> 27.0 B / 1
>
> 6
>
> 9315
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 7
>
> 9316
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 8
>
> 9317
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 9
>
> 9318
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 10
>
> 9319
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 11
>
> 9320
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 12
>
> 9321
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 13
>
> 9322
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 14
>
> 9323
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 15
>
> 9324
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 16
>
> 9325
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 17
>
> 9326
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 18
>
> 9327
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 19
>
> 9328
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 20
>
> 9329
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 21
>
> 9330
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 22
>
> 9331
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 23
>
> 9332
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 24
>
> 9333
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 25
>
> 9334
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 26
>
> 9335
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 27
>
> 9336
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 28
>
> 9337
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 29
>
> 9338
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 30
>
> 9339
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 31
>
> 9340
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 32
>
> 9341
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 33
>
> 9342
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 34
>
> 9343
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 35
>
> 9344
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 36
>
> 9345
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 37
>
> 9346
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 38
>
> 9347
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 39
>
> 9348
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 23 ms
>
> 27.0 B / 1
>
> 40
>
> 9349
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 41
>
> 9350
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 42
>
> 9351
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 43
>
> 9352
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 44
>
> 9353
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 45
>
> 9354
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 46
>
> 9355
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 47
>
> 9356
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 48
>
> 9357
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 49
>
> 9358
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
>
>
>
>
> *Streaming*
>
> · *Started at: *Thu Jul 09 09:27:01 CEST 2015
>
> · *Time since start: *24 minutes
>
> · *Network receivers: *1
>
> · *Batch interval: *10 seconds
>
> · *Processed batches: *143
>
> · *Waiting batches: *1
>
> · *Received records: *295016
>
> · *Processed records: *292937
>
>
>
> *Statistics over last 100 processed batches*
>
> *Receiver Statistics*
>
> · *Receiver*
>
> · *Status*
>
> · *Location*
>
> · *Records in last batch*
>
> · *[2015/07/09 09:51:01]*
>
> · *Minimum rate*
>
> · *[records/sec]*
>
> · *Median rate*
>
> · *[records/sec]*
>
> · *Maximum rate*
>
> · *[records/sec]*
>
> · *Last Error*
>
> SocketReceiver-0
>
> ACTIVE
>
> xxx81441xx
>
> 2.1 K
>
> 196
>
> 205
>
> 218
>
> -
>
> *Batch Processing Statistics*
>
> *Metric*
>
> *Last batch*
>
> *Minimum*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Maximum*
>
> Processing Time
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 736 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
> Scheduling Delay
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 1 ms
>
> Total Delay
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 737 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
>
>
>
>
>
>
> *Executors (3)*
>
> · *Memory:* 0.0 B Used (1326.0 MB Total)
>
> · *Disk:* 0.0 B Used
>
> *Executor ID*
>
> *Address*
>
> *RDD Blocks*
>
> *Memory Used*
>
> *Disk Used*
>
> *Active Tasks*
>
> *Failed Tasks*
>
> *Complete Tasks*
>
> *Total Tasks*
>
> *Task Time*
>
> *Input*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> *Logs*
>
> *Thread Dump*
>
> 1
>
> xxx81440xxx:51081
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 8159
>
> 8159
>
> 3.0 m
>
> 0.0 B
>
> 392.0 B
>
> 206.8 KB
>
> stdout
>
> stderr
>
> Thread Dump
>
> 2
>
> xxx81441xxx:34114
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 1
>
> 0
>
> 35
>
> 36
>
> 5.1 s
>
> 0.0 B
>
> 364.0 B
>
> 700.0 B
>
> stdout
>
> stderr
>
> Thread Dump
>
> <driver>
>
> xxx81436xxx:38325
>
> 0
>
> 0.0 B / 265.4 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0 ms
>
> 0.0 B
>
> 0.0 B
>
> 0.0 B
>
> Thread Dump
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Tathagata Das [mailto:tdas@databricks.com]
> *Sent:* donderdag 9 juli 2015 09:38
> *To:* Michel Hubert
> *Cc:* user@spark.apache.org
> *Subject:* Re: spark streaming performance
>
>
>
> What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second.
>
> Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?
>
>
>
> On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>
> wrote:
>
>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> · *Total task time across all tasks: *0.8 s
>
> · *Shuffle write: *173.5 KB / 2031
>
> *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
RE: spark streaming performance
Posted by Michel Hubert <mi...@vsnsystemen.nl>.
What do you mean with “full task list of all the 50 tasks”?
This?
Details for Stage 271
· Total task time across all tasks: 0.7 s
· Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
13 ms
13 ms
14 ms
14 ms
15 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
bfravicsvr81441-cld.opentsp.com:47790
0.9 s
51
0
51
1377.0 B / 51
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
50
3549
0
SUCCESS
PROCESS_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:10
13 ms
27.0 B / 1
0
3550
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
15 ms
27.0 B / 1
1
3551
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
14 ms
27.0 B / 1
2
3552
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
14 ms
27.0 B / 1
3
3553
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
14 ms
27.0 B / 1
4
3554
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
13 ms
27.0 B / 1
5
3555
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
13 ms
27.0 B / 1
6
3556
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
14 ms
27.0 B / 1
7
3557
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:13
15 ms
27.0 B / 1
8
3558
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
9
3559
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
10
3560
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
11
3561
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
12
3562
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
13
3563
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
15 ms
27.0 B / 1
14
3564
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
15
3565
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
16
3566
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
17
3567
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
18
3568
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
19
3569
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
20
3570
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
21
3571
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
22
3572
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
23
3573
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
24
3574
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
25
3575
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
26
3576
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
27
3577
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
28
3578
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
15 ms
27.0 B / 1
29
3579
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
30
3580
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
31
3581
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
32
3582
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
33
3583
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
15 ms
27.0 B / 1
34
3584
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
35
3585
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
36
3586
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
37
3587
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
38
3588
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
39
3589
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
40
3590
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
41
3591
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
42
3592
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
13 ms
27.0 B / 1
43
3593
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
15 ms
27.0 B / 1
44
3594
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
45
3595
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
46
3596
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
47
3597
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
48
3598
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
14 ms
27.0 B / 1
49
3599
0
SUCCESS
RACK_LOCAL
2 / bfravicsvr81441-cld.opentsp.com
2015/07/09 11:09:14
15 ms
27.0 B / 1
From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 10:28
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance
Can you provide the full task list of all the 50 tasks? I am curious to find out whats going on.
And spark.executor.cores needs to be set. Or --executor-cores in spark-submit
On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
Thanks for you reply!
@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?
@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?
I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id
Description
Submitted
Duration
Tasks: Succeeded/Total
Input
Output
Shuffle Read
Shuffle Write
706
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details
2015/07/09 09:56:24
14 ms
1/1
704
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details
2015/07/09 09:56:24
74 ms
1/1
1377.0 B
703
count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details
2015/07/09 09:56:20
5 s
51/51
1377.0 B
Details for Stage 703
• Total task time across all tasks: 0.6 s
• Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
11 ms
12 ms
12 ms
13 ms
23 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
1
xxx81440xxx:51081
0.8 s
51
0
51
1377.0 B / 51
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
50
9308
0
SUCCESS
PROCESS_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:20
11 ms
27.0 B / 1
0
9309
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
1
9310
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
2
9311
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
3
9312
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
4
9313
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
5
9314
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
11 ms
27.0 B / 1
6
9315
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
7
9316
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
8
9317
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
9
9318
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
10
9319
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
11
9320
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
12
9321
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
13
9322
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
14
9323
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
15
9324
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
16
9325
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
17
9326
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
18
9327
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
19
9328
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
20
9329
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
21
9330
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
22
9331
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
23
9332
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
24
9333
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
25
9334
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
26
9335
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
27
9336
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
28
9337
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
29
9338
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
30
9339
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
31
9340
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
32
9341
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
33
9342
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
34
9343
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
35
9344
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
36
9345
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
37
9346
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
38
9347
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
39
9348
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
23 ms
27.0 B / 1
40
9349
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
41
9350
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
42
9351
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
43
9352
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
44
9353
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
45
9354
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
46
9355
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
47
9356
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
48
9357
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
49
9358
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
Streaming
• Started at: Thu Jul 09 09:27:01 CEST 2015
• Time since start: 24 minutes
• Network receivers: 1
• Batch interval: 10 seconds
• Processed batches: 143
• Waiting batches: 1
• Received records: 295016
• Processed records: 292937
Statistics over last 100 processed batches
Receiver Statistics
• Receiver
• Status
• Location
• Records in last batch
• [2015/07/09 09:51:01]
• Minimum rate
• [records/sec]
• Median rate
• [records/sec]
• Maximum rate
• [records/sec]
• Last Error
SocketReceiver-0
ACTIVE
xxx81441xx
2.1 K
196
205
218
-
Batch Processing Statistics
Metric
Last batch
Minimum
25th percentile
Median
75th percentile
Maximum
Processing Time
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 736 ms
4 seconds 775 ms
4 seconds 926 ms
Scheduling Delay
0 ms
0 ms
0 ms
0 ms
0 ms
1 ms
Total Delay
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 737 ms
4 seconds 775 ms
4 seconds 926 ms
Executors (3)
• Memory: 0.0 B Used (1326.0 MB Total)
• Disk: 0.0 B Used
Executor ID
Address
RDD Blocks
Memory Used
Disk Used
Active Tasks
Failed Tasks
Complete Tasks
Total Tasks
Task Time
Input
Shuffle Read
Shuffle Write
Logs
Thread Dump
1
xxx81440xxx:51081
0
0.0 B / 530.3 MB
0.0 B
0
0
8159
8159
3.0 m
0.0 B
392.0 B
206.8 KB
stdout
stderr
Thread Dump
2
xxx81441xxx:34114
0
0.0 B / 530.3 MB
0.0 B
1
0
35
36
5.1 s
0.0 B
364.0 B
700.0 B
stdout
stderr
Thread Dump
<driver>
xxx81436xxx:38325
0
0.0 B / 265.4 MB
0.0 B
0
0
0
0
0 ms
0.0 B
0.0 B
0.0 B
Thread Dump
From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance
What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine than on our cluster.
I submit it to yarn on our cloudera cluster.
But my first question is more detailed:
In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.
For example:
1678
map at LogonAnalysis.scala:215+details
2015/07/09 09:17:00
5 s
50/50
173.5 KB
But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?
Details for Stage 1678
• Total task time across all tasks: 0.8 s
• Shuffle write: 173.5 KB / 2031
Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
14 ms
14 ms
15 ms
15 ms
24 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
2.6 KB / 28
3.1 KB / 35
3.5 KB / 42
3.9 KB / 46
4.4 KB / 53
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
xxxx:44231
1.0 s
50
0
50
173.5 KB / 2031
Re: spark streaming performance
Posted by Tathagata Das <td...@databricks.com>.
Can you provide the full task list of all the 50 tasks? I am curious to
find out whats going on.
And spark.executor.cores needs to be set. Or --executor-cores in
spark-submit
On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:
>
>
>
>
> Hi,
>
>
>
> Thanks for you reply!
>
>
>
>
>
> *@What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second. *
>
>
>
> What metric/page shows the number of cores in the executor? Should I
> specify “spark.executor.cores” when submitting the application?
>
>
>
>
>
>
>
> @*Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?*
>
>
>
> Stage 704 seems to be submitted 4 sec later. What could be the reason
> other than stage 703 finished in 4 to 5 sec?
>
>
>
>
>
> I’ve copy pasted some more detailed info (masked out some servernames
> xxxNUMBERxxx).
> Completed Stages (530)
>
> *Stage Id*
>
> *Description*
>
> *Submitted*
>
> *Duration*
>
> *Tasks: Succeeded/Total*
>
> *Input*
>
> *Output*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> 706
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 14 ms
>
> 1/1
>
> 704
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 74 ms
>
> 1/1
>
> 1377.0 B
>
> 703
>
> count at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>
> +details
>
> 2015/07/09 09:56:20
>
> 5 s
>
> 51/51
>
> 1377.0 B
>
>
>
>
>
>
>
> *Details for Stage 703*
>
> · *Total task time across all tasks: *0.6 s
>
> · *Shuffle write: *1377.0 B / 51
>
> *Show additional metrics*
>
> *Summary Metrics for 51 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 11 ms
>
> 12 ms
>
> 12 ms
>
> 13 ms
>
> 23 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 1
>
> xxx81440xxx:51081
>
> 0.8 s
>
> 51
>
> 0
>
> 51
>
> 1377.0 B / 51
>
> *Tasks*
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 50
>
> 9308
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:20
>
> 11 ms
>
> 27.0 B / 1
>
> 0
>
> 9309
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 1
>
> 9310
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 2
>
> 9311
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 3
>
> 9312
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 4
>
> 9313
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 5
>
> 9314
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 11 ms
>
> 27.0 B / 1
>
> 6
>
> 9315
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 7
>
> 9316
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 8
>
> 9317
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 9
>
> 9318
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 10
>
> 9319
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 11
>
> 9320
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 12
>
> 9321
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 13
>
> 9322
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 14
>
> 9323
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 15
>
> 9324
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 16
>
> 9325
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 17
>
> 9326
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 18
>
> 9327
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 19
>
> 9328
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 20
>
> 9329
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 21
>
> 9330
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 22
>
> 9331
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 23
>
> 9332
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 24
>
> 9333
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 25
>
> 9334
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 26
>
> 9335
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 27
>
> 9336
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 28
>
> 9337
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 29
>
> 9338
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 30
>
> 9339
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 31
>
> 9340
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 32
>
> 9341
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 33
>
> 9342
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 34
>
> 9343
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 35
>
> 9344
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 36
>
> 9345
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 37
>
> 9346
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 38
>
> 9347
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 39
>
> 9348
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 23 ms
>
> 27.0 B / 1
>
> 40
>
> 9349
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 41
>
> 9350
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 42
>
> 9351
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 43
>
> 9352
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 44
>
> 9353
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 45
>
> 9354
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 46
>
> 9355
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 47
>
> 9356
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 48
>
> 9357
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 49
>
> 9358
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
>
>
>
>
> *Streaming*
>
> · *Started at: *Thu Jul 09 09:27:01 CEST 2015
>
> · *Time since start: *24 minutes
>
> · *Network receivers: *1
>
> · *Batch interval: *10 seconds
>
> · *Processed batches: *143
>
> · *Waiting batches: *1
>
> · *Received records: *295016
>
> · *Processed records: *292937
>
>
>
>
> *Statistics over last 100 processed batches*
>
> *Receiver Statistics*
>
> · *Receiver*
>
> · *Status*
>
> · *Location*
>
> · *Records in last batch*
>
> · *[2015/07/09 09:51:01]*
>
> · *Minimum rate*
>
> · *[records/sec]*
>
> · *Median rate*
>
> · *[records/sec]*
>
> · *Maximum rate*
>
> · *[records/sec]*
>
> · *Last Error*
>
> SocketReceiver-0
>
> ACTIVE
>
> xxx81441xx
>
> 2.1 K
>
> 196
>
> 205
>
> 218
>
> -
>
> *Batch Processing Statistics*
>
> *Metric*
>
> *Last batch*
>
> *Minimum*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Maximum*
>
> Processing Time
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 736 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
> Scheduling Delay
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 1 ms
>
> Total Delay
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 737 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
>
>
>
>
>
>
> *Executors (3)*
>
> · *Memory:* 0.0 B Used (1326.0 MB Total)
>
> · *Disk:* 0.0 B Used
>
> *Executor ID*
>
> *Address*
>
> *RDD Blocks*
>
> *Memory Used*
>
> *Disk Used*
>
> *Active Tasks*
>
> *Failed Tasks*
>
> *Complete Tasks*
>
> *Total Tasks*
>
> *Task Time*
>
> *Input*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> *Logs*
>
> *Thread Dump*
>
> 1
>
> xxx81440xxx:51081
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 8159
>
> 8159
>
> 3.0 m
>
> 0.0 B
>
> 392.0 B
>
> 206.8 KB
>
> stdout
>
> stderr
>
> Thread Dump
>
> 2
>
> xxx81441xxx:34114
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 1
>
> 0
>
> 35
>
> 36
>
> 5.1 s
>
> 0.0 B
>
> 364.0 B
>
> 700.0 B
>
> stdout
>
> stderr
>
> Thread Dump
>
> <driver>
>
> xxx81436xxx:38325
>
> 0
>
> 0.0 B / 265.4 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0 ms
>
> 0.0 B
>
> 0.0 B
>
> 0.0 B
>
> Thread Dump
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Tathagata Das [mailto:tdas@databricks.com]
> *Sent:* donderdag 9 juli 2015 09:38
> *To:* Michel Hubert
> *Cc:* user@spark.apache.org
> *Subject:* Re: spark streaming performance
>
>
>
> What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second.
>
> Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?
>
>
>
> On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>
> wrote:
>
>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> · *Total task time across all tasks: *0.8 s
>
> · *Shuffle write: *173.5 KB / 2031
>
> *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>
>
>
RE: spark streaming performance
Posted by Michel Hubert <mi...@vsnsystemen.nl>.
Hi,
Thanks for you reply!
@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?
@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?
I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id
Description
Submitted
Duration
Tasks: Succeeded/Total
Input
Output
Shuffle Read
Shuffle Write
706
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details
2015/07/09 09:56:24
14 ms
1/1
704
print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details
2015/07/09 09:56:24
74 ms
1/1
1377.0 B
703
count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details
2015/07/09 09:56:20
5 s
51/51
1377.0 B
Details for Stage 703
· Total task time across all tasks: 0.6 s
· Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
11 ms
12 ms
12 ms
13 ms
23 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
27.0 B / 1
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
1
xxx81440xxx:51081
0.8 s
51
0
51
1377.0 B / 51
Tasks
Index
ID
Attempt
Status
Locality Level
Executor ID / Host
Launch Time
Duration
GC Time
Write Time
Shuffle Write Size / Records
Errors
50
9308
0
SUCCESS
PROCESS_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:20
11 ms
27.0 B / 1
0
9309
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
1
9310
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
2
9311
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
3
9312
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
4
9313
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
5
9314
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
11 ms
27.0 B / 1
6
9315
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
7
9316
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
8
9317
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
13 ms
27.0 B / 1
9
9318
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
10
9319
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
11
9320
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
14 ms
27.0 B / 1
12
9321
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:23
12 ms
27.0 B / 1
13
9322
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
14
9323
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
15
9324
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
16
9325
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
17
9326
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
18
9327
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
19
9328
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
20
9329
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
21
9330
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
22
9331
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
23
9332
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
24
9333
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
25
9334
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
26
9335
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
27
9336
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
28
9337
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
29
9338
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
30
9339
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
31
9340
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
32
9341
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
33
9342
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
34
9343
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
35
9344
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
36
9345
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
37
9346
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
38
9347
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
39
9348
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
23 ms
27.0 B / 1
40
9349
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
41
9350
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
42
9351
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
43
9352
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
11 ms
27.0 B / 1
44
9353
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
45
9354
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
46
9355
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
47
9356
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
48
9357
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
12 ms
27.0 B / 1
49
9358
0
SUCCESS
RACK_LOCAL
1 / xxx81440xxx
2015/07/09 09:56:24
13 ms
27.0 B / 1
Streaming
· Started at: Thu Jul 09 09:27:01 CEST 2015
· Time since start: 24 minutes
· Network receivers: 1
· Batch interval: 10 seconds
· Processed batches: 143
· Waiting batches: 1
· Received records: 295016
· Processed records: 292937
Statistics over last 100 processed batches
Receiver Statistics
· Receiver
· Status
· Location
· Records in last batch
· [2015/07/09 09:51:01]
· Minimum rate
· [records/sec]
· Median rate
· [records/sec]
· Maximum rate
· [records/sec]
· Last Error
SocketReceiver-0
ACTIVE
xxx81441xx
2.1 K
196
205
218
-
Batch Processing Statistics
Metric
Last batch
Minimum
25th percentile
Median
75th percentile
Maximum
Processing Time
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 736 ms
4 seconds 775 ms
4 seconds 926 ms
Scheduling Delay
0 ms
0 ms
0 ms
0 ms
0 ms
1 ms
Total Delay
4 seconds 679 ms
4 seconds 653 ms
4 seconds 702 ms
4 seconds 737 ms
4 seconds 775 ms
4 seconds 926 ms
Executors (3)
· Memory: 0.0 B Used (1326.0 MB Total)
· Disk: 0.0 B Used
Executor ID
Address
RDD Blocks
Memory Used
Disk Used
Active Tasks
Failed Tasks
Complete Tasks
Total Tasks
Task Time
Input
Shuffle Read
Shuffle Write
Logs
Thread Dump
1
xxx81440xxx:51081
0
0.0 B / 530.3 MB
0.0 B
0
0
8159
8159
3.0 m
0.0 B
392.0 B
206.8 KB
stdout
stderr
Thread Dump
2
xxx81441xxx:34114
0
0.0 B / 530.3 MB
0.0 B
1
0
35
36
5.1 s
0.0 B
364.0 B
700.0 B
stdout
stderr
Thread Dump
<driver>
xxx81436xxx:38325
0
0.0 B / 265.4 MB
0.0 B
0
0
0
0
0 ms
0.0 B
0.0 B
0.0 B
Thread Dump
From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance
What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?
On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
Hi,
I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine than on our cluster.
I submit it to yarn on our cloudera cluster.
But my first question is more detailed:
In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.
For example:
1678
map at LogonAnalysis.scala:215+details
2015/07/09 09:17:00
5 s
50/50
173.5 KB
But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?
Details for Stage 1678
• Total task time across all tasks: 0.8 s
• Shuffle write: 173.5 KB / 2031
Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric
Min
25th percentile
Median
75th percentile
Max
Duration
14 ms
14 ms
15 ms
15 ms
24 ms
GC Time
0 ms
0 ms
0 ms
0 ms
0 ms
Shuffle Write Size / Records
2.6 KB / 28
3.1 KB / 35
3.5 KB / 42
3.9 KB / 46
4.4 KB / 53
Aggregated Metrics by Executor
Executor ID
Address
Task Time
Total Tasks
Failed Tasks
Succeeded Tasks
Shuffle Write Size / Records
2
xxxx:44231
1.0 s
50
0
50
173.5 KB / 2031
Re: spark streaming performance
Posted by Tathagata Das <td...@databricks.com>.
What were the number of cores in the executor? It could be that you had
only one core in the executor which did all the 50 tasks serially so 50
task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the
tasks were added to see whether it explains the 5 second?
On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:
>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> · *Total task time across all tasks: *0.8 s
>
> · *Shuffle write: *173.5 KB / 2031
>
> *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>