You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michel Hubert <mi...@vsnsystemen.nl> on 2015/07/09 09:21:29 UTC

spark streaming performance

Hi,

I've developed a POC Spark Streaming application.
But it seems to perform better on my development machine  than on our cluster.
I submit it to yarn on our cloudera cluster.

But my first question is more detailed:

In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.

For example:
1678

map at LogonAnalysis.scala:215+details

2015/07/09 09:17:00

5 s

50/50

173.5 KB


But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?


Details for Stage 1678
*         Total task time across all tasks: 0.8 s
*         Shuffle write: 173.5 KB / 2031
 Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

14 ms

14 ms

15 ms

15 ms

24 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

2.6 KB / 28

3.1 KB / 35

3.5 KB / 42

3.9 KB / 46

4.4 KB / 53

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxxx:44231

1.0 s

50

0

50

173.5 KB / 2031







RE: spark streaming performance

Posted by Michel Hubert <mi...@vsnsystemen.nl>.
What do you mean with “full task list of all the 50 tasks”?

From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 10:28
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance

Can you provide the full task list of all the 50 tasks? I am curious to find out whats going on.

And spark.executor.cores needs to be set. Or --executor-cores in spark-submit


On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:


Hi,

Thanks for you reply!


@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.

What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?



@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?


I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id

Description

Submitted

Duration

Tasks: Succeeded/Total

Input

Output

Shuffle Read

Shuffle Write

706

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details

2015/07/09 09:56:24

14 ms

1/1

704

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details

2015/07/09 09:56:24

74 ms

1/1

1377.0 B

703

count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details

2015/07/09 09:56:20

5 s

51/51

1377.0 B




Details for Stage 703
•         Total task time across all tasks: 0.6 s
•         Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

11 ms

12 ms

12 ms

13 ms

23 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

1

xxx81440xxx:51081

0.8 s

51

0

51

1377.0 B / 51

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

50

9308

0

SUCCESS

PROCESS_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:20

11 ms

27.0 B / 1

0

9309

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

1

9310

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

2

9311

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

3

9312

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

4

9313

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

5

9314

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

11 ms

27.0 B / 1

6

9315

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

7

9316

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

8

9317

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

9

9318

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

10

9319

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

11

9320

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

12

9321

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

13

9322

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

14

9323

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

15

9324

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

16

9325

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

17

9326

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

18

9327

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

19

9328

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

20

9329

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

21

9330

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

22

9331

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

23

9332

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

24

9333

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

25

9334

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

26

9335

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

27

9336

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

28

9337

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

29

9338

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

30

9339

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

31

9340

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

32

9341

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

33

9342

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

34

9343

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

35

9344

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

36

9345

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

37

9346

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

38

9347

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

39

9348

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

23 ms

27.0 B / 1

40

9349

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

41

9350

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

42

9351

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

43

9352

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

44

9353

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

45

9354

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

46

9355

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

47

9356

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

48

9357

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

49

9358

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1



Streaming
•         Started at: Thu Jul 09 09:27:01 CEST 2015
•         Time since start: 24 minutes
•         Network receivers: 1
•         Batch interval: 10 seconds
•         Processed batches: 143
•         Waiting batches: 1
•         Received records: 295016
•         Processed records: 292937


Statistics over last 100 processed batches
Receiver Statistics
•         Receiver

•         Status

•         Location

•         Records in last batch
•         [2015/07/09 09:51:01]

•         Minimum rate
•         [records/sec]

•         Median rate
•         [records/sec]

•         Maximum rate
•         [records/sec]

•         Last Error

SocketReceiver-0

ACTIVE

xxx81441xx

2.1 K

196

205

218

-

Batch Processing Statistics
Metric

Last batch

Minimum

25th percentile

Median

75th percentile

Maximum

Processing Time

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 736 ms

4 seconds 775 ms

4 seconds 926 ms

Scheduling Delay

0 ms

0 ms

0 ms

0 ms

0 ms

1 ms

Total Delay

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 737 ms

4 seconds 775 ms

4 seconds 926 ms




Executors (3)
•         Memory: 0.0 B Used (1326.0 MB Total)
•         Disk: 0.0 B Used
Executor ID

Address

RDD Blocks

Memory Used

Disk Used

Active Tasks

Failed Tasks

Complete Tasks

Total Tasks

Task Time

Input

Shuffle Read

Shuffle Write

Logs

Thread Dump

1

xxx81440xxx:51081

0

0.0 B / 530.3 MB

0.0 B

0

0

8159

8159

3.0 m

0.0 B

392.0 B

206.8 KB

stdout
stderr

Thread Dump

2

xxx81441xxx:34114

0

0.0 B / 530.3 MB

0.0 B

1

0

35

36

5.1 s

0.0 B

364.0 B

700.0 B

stdout
stderr

Thread Dump

<driver>

xxx81436xxx:38325

0

0.0 B / 265.4 MB

0.0 B

0

0

0

0

0 ms

0.0 B

0.0 B

0.0 B

Thread Dump







From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance

What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:

Hi,

I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine  than on our cluster.
I submit it to yarn on our cloudera cluster.

But my first question is more detailed:

In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.

For example:
1678

map at LogonAnalysis.scala:215+details

2015/07/09 09:17:00

5 s

50/50

173.5 KB


But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?


Details for Stage 1678
•         Total task time across all tasks: 0.8 s
•         Shuffle write: 173.5 KB / 2031
 Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

14 ms

14 ms

15 ms

15 ms

24 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

2.6 KB / 28

3.1 KB / 35

3.5 KB / 42

3.9 KB / 46

4.4 KB / 53

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxxx:44231

1.0 s

50

0

50

173.5 KB / 2031









Re: spark streaming performance

Posted by Tathagata Das <td...@databricks.com>.
I am not sure why you are getting node_local and not process_local. Also
there is probably not a good documentation other than that configuration
page - http://spark.apache.org/docs/latest/configuration.html (search for
locality)

On Thu, Jul 9, 2015 at 5:51 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:

>
>
>
>
>
>
> StorageLevel.MEMORY_ONLY
>
>
>
> Streaming / Batch Processing Statistics / Processing Time:
>
> Executor-cores NOT specified: 5s
>
> Executor-cores 8: 400  ms
>
>
>
> Processing time has significantly reduced with executor-cores set to 8.
>
> But what is the general rule of thumb for a good excecutor-cores settings?
>
>
>
> But 257 records/sec still isn’t very, or not?
>
> Statistics over last 95 processed batches Receiver Statistics
>
> ·         *Receiver*
>
> ·         *Status*
>
> ·         *Location*
>
> ·         *Records in last batch*
>
> ·         *[2015/07/09 14:49:47]*
>
> ·         *Minimum rate*
>
> ·         *[records/sec]*
>
> ·         *Median rate*
>
> ·         *[records/sec]*
>
> ·         *Maximum rate*
>
> ·         *[records/sec]*
>
> ·         *Last Error*
>
> SocketReceiver-0
>
> ACTIVE
>
> bfravicsvr81440-cld.opentsp.com
>
> 1352
>
> 0
>
> 257
>
> 279
>
> -
>
>
>
>
>
>
>
> “After that the somehow the system is not able to launch any process
> local task”
>
> Where can I look for an answer why some tasks are NODE_LOCAL and others
> are PROCESS_LOCAL?
>
> Where should I look for the reason the locality wait expires?
>
>
>  Details for Stage 147
>
> ·         *Total task time across all tasks:* 0.4 s
>
> ·         *Shuffle write:* 704.0 B / 26
>
>  *Show additional metrics*
>  Summary Metrics for 26 Completed Tasks
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 9 ms
>
> 11 ms
>
> 12 ms
>
> 24 ms
>
> 30 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 28.0 B / 1
>    Aggregated Metrics by Executor
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 1
>
> xxx81440xxx:38882
>
> 0.8 s
>
> 25
>
> 0
>
> 25
>
> 677.0 B / 25
>
> 2
>
> xxx81441xxx:46832
>
> 45 ms
>
> 1
>
> 0
>
> 1
>
> 27.0 B / 1
>    Tasks
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 0
>
> 1029
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 30 ms
>
> 27.0 B / 1
>
> 2
>
> 1031
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 29 ms
>
> 27.0 B / 1
>
> 1
>
> 1030
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 30 ms
>
> 27.0 B / 1
>
> 5
>
> 1034
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 25 ms
>
> 27.0 B / 1
>
> 4
>
> 1033
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 28 ms
>
> 27.0 B / 1
>
> 3
>
> 1032
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 27 ms
>
> 27.0 B / 1
>
> 25
>
> 1036
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 2 / xxx81441xxx
>
> 2015/07/09 14:43:55
>
> 23 ms
>
> 27.0 B / 1
>
> 6
>
> 1035
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 24 ms
>
> 27.0 B / 1
>
> 7
>
> 1037
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 10 ms
>
> 27.0 B / 1
>
> 8
>
> 1038
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 12 ms
>
> 27.0 B / 1
>
> 9
>
> 1039
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 11 ms
>
> 27.0 B / 1
>
> 10
>
> 1040
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 11 ms
>
> 27.0 B / 1
>
> 11
>
> 1041
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 14:43:55
>
> 11 ms
>
> 27.0 B / 1
>
> 12
>
> 1042
>
> 0
>
> SUCCESS
>
> NODE_LOCAL
>
> 1 / xxx81440xxx
>  ...
>
> [Message clipped]

RE: spark streaming performance

Posted by Michel Hubert <mi...@vsnsystemen.nl>.


StorageLevel.MEMORY_ONLY

Streaming / Batch Processing Statistics / Processing Time:
Executor-cores NOT specified: 5s
Executor-cores 8: 400  ms

Processing time has significantly reduced with executor-cores set to 8.
But what is the general rule of thumb for a good excecutor-cores settings?

But 257 records/sec still isn’t very, or not?

Statistics over last 95 processed batches
Receiver Statistics
·         Receiver

·         Status

·         Location

·         Records in last batch
·         [2015/07/09 14:49:47]

·         Minimum rate
·         [records/sec]

·         Median rate
·         [records/sec]

·         Maximum rate
·         [records/sec]

·         Last Error

SocketReceiver-0

ACTIVE

bfravicsvr81440-cld.opentsp.com

1352

0

257

279

-




“After that the somehow the system is not able to launch any process local task”
Where can I look for an answer why some tasks are NODE_LOCAL and others are PROCESS_LOCAL?
Where should I look for the reason the locality wait expires?

Details for Stage 147
·         Total task time across all tasks: 0.4 s
·         Shuffle write: 704.0 B / 26
 Show additional metrics
Summary Metrics for 26 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

9 ms

11 ms

12 ms

24 ms

30 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

28.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

1

xxx81440xxx:38882

0.8 s

25

0

25

677.0 B / 25

2

xxx81441xxx:46832

45 ms

1

0

1

27.0 B / 1

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

0

1029

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

30 ms

27.0 B / 1

2

1031

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

29 ms

27.0 B / 1

1

1030

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

30 ms

27.0 B / 1

5

1034

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

25 ms

27.0 B / 1

4

1033

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

28 ms

27.0 B / 1

3

1032

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

27 ms

27.0 B / 1

25

1036

0

SUCCESS

PROCESS_LOCAL

2 / xxx81441xxx

2015/07/09 14:43:55

23 ms

27.0 B / 1

6

1035

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

24 ms

27.0 B / 1

7

1037

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

10 ms

27.0 B / 1

8

1038

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

12 ms

27.0 B / 1

9

1039

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

11 ms

27.0 B / 1

10

1040

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

11 ms

27.0 B / 1

11

1041

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

11 ms

27.0 B / 1

12

1042

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

12 ms

27.0 B / 1

13

1043

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

15 ms

27.0 B / 1

14

1044

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

9 ms

27.0 B / 1

15

1045

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

12 ms

28.0 B / 1

16

1046

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

10 ms

27.0 B / 1

17

1047

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

11 ms

27.0 B / 1

18

1048

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

11 ms

27.0 B / 1

19

1049

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

11 ms

27.0 B / 1

20

1050

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

12 ms

27.0 B / 1

21

1051

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

16 ms

28.0 B / 1

22

1052

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

12 ms

27.0 B / 1

23

1053

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

10 ms

27.0 B / 1

24

1054

0

SUCCESS

NODE_LOCAL

1 / xxx81440xxx

2015/07/09 14:43:55

12 ms

27.0 B / 1






From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 12:46
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance

I think I know what is going on. What is the storage level that you are using for your receivers? Are you using replication in the storage level?

Whats going on here is the following. There is 1 task with process_level locality that is immediately launched as soon as the stage is submitted. After that the somehow the system is not able to launch any process local tasks, so has to wait for the spark.locality.wait = 3 seconds, to allow the lower levels of locality, which is rack local. Then it launches rest of 49 tasks, which finish within 1 second in total. Overall, its taking 3 second wait + 1 second + something = approx 5 seconds.

Increasing the number of cores per container will definitely increase performance.

TD

On Thu, Jul 9, 2015 at 2:09 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:
What do you mean with “full task list of all the 50 tasks”?

This?
Details for Stage 271
•         Total task time across all tasks: 0.7 s
•         Shuffle write: 1377.0 B / 51
 Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

13 ms

13 ms

14 ms

14 ms

15 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxx81441xxx:47790<http://bfravicsvr81441-cld.opentsp.com:47790>

0.9 s

51

0

51

1377.0 B / 51

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

50

3549

0

SUCCESS

PROCESS_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:10

13 ms

27.0 B / 1

0

3550

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

15 ms

27.0 B / 1

1

3551

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

14 ms

27.0 B / 1

2

3552

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

14 ms

27.0 B / 1

3

3553

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

14 ms

27.0 B / 1

4

3554

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

13 ms

27.0 B / 1

5

3555

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

13 ms

27.0 B / 1

6

3556

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

14 ms

27.0 B / 1

7

3557

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:13

15 ms

27.0 B / 1

8

3558

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

9

3559

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

10

3560

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

11

3561

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

12

3562

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

13

3563

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

15 ms

27.0 B / 1

14

3564

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

15

3565

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

16

3566

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

17

3567

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

18

3568

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

19

3569

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

20

3570

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

21

3571

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

22

3572

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

23

3573

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

24

3574

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

25

3575

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

26

3576

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

27

3577

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

28

3578

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

15 ms

27.0 B / 1

29

3579

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

30

3580

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

31

3581

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

32

3582

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

33

3583

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

15 ms

27.0 B / 1

34

3584

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

35

3585

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

36

3586

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

37

3587

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

38

3588

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

39

3589

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

40

3590

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

41

3591

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

42

3592

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

13 ms

27.0 B / 1

43

3593

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

15 ms

27.0 B / 1

44

3594

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

45

3595

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

46

3596

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

47

3597

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

48

3598

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

14 ms

27.0 B / 1

49

3599

0

SUCCESS

RACK_LOCAL

2 / xxx81441xxx<http://bfravicsvr81441-cld.opentsp.com>

2015/07/09 11:09:14

15 ms

27.0 B / 1



From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 10:28
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance

Can you provide the full task list of all the 50 tasks? I am curious to find out whats going on.

And spark.executor.cores needs to be set. Or --executor-cores in spark-submit


On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:


Hi,

Thanks for you reply!


@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.

What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?



@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?


I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id

Description

Submitted

Duration

Tasks: Succeeded/Total

Input

Output

Shuffle Read

Shuffle Write

706

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details

2015/07/09 09:56:24

14 ms

1/1

704

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details

2015/07/09 09:56:24

74 ms

1/1

1377.0 B

703

count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details

2015/07/09 09:56:20

5 s

51/51

1377.0 B




Details for Stage 703
•         Total task time across all tasks: 0.6 s
•         Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

11 ms

12 ms

12 ms

13 ms

23 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

1

xxx81440xxx:51081

0.8 s

51

0

51

1377.0 B / 51

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

50

9308

0

SUCCESS

PROCESS_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:20

11 ms

27.0 B / 1

0

9309

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

1

9310

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

2

9311

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

3

9312

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

4

9313

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

5

9314

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

11 ms

27.0 B / 1

6

9315

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

7

9316

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

8

9317

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

9

9318

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

10

9319

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

11

9320

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

12

9321

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

13

9322

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

14

9323

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

15

9324

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

16

9325

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

17

9326

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

18

9327

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

19

9328

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

20

9329

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

21

9330

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

22

9331

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

23

9332

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

24

9333

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

25

9334

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

26

9335

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

27

9336

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

28

9337

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

29

9338

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

30

9339

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

31

9340

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

32

9341

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

33

9342

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

34

9343

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

35

9344

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

36

9345

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

37

9346

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

38

9347

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

39

9348

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

23 ms

27.0 B / 1

40

9349

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

41

9350

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

42

9351

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

43

9352

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

44

9353

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

45

9354

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

46

9355

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

47

9356

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

48

9357

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

49

9358

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1



Streaming
•         Started at: Thu Jul 09 09:27:01 CEST 2015
•         Time since start: 24 minutes
•         Network receivers: 1
•         Batch interval: 10 seconds
•         Processed batches: 143
•         Waiting batches: 1
•         Received records: 295016
•         Processed records: 292937

Statistics over last 100 processed batches
Receiver Statistics
•         Receiver

•         Status

•         Location

•         Records in last batch
•         [2015/07/09 09:51:01]

•         Minimum rate
•         [records/sec]

•         Median rate
•         [records/sec]

•         Maximum rate
•         [records/sec]

•         Last Error

SocketReceiver-0

ACTIVE

xxx81441xx

2.1 K

196

205

218

-

Batch Processing Statistics
Metric

Last batch

Minimum

25th percentile

Median

75th percentile

Maximum

Processing Time

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 736 ms

4 seconds 775 ms

4 seconds 926 ms

Scheduling Delay

0 ms

0 ms

0 ms

0 ms

0 ms

1 ms

Total Delay

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 737 ms

4 seconds 775 ms

4 seconds 926 ms




Executors (3)
•         Memory: 0.0 B Used (1326.0 MB Total)
•         Disk: 0.0 B Used
Executor ID

Address

RDD Blocks

Memory Used

Disk Used

Active Tasks

Failed Tasks

Complete Tasks

Total Tasks

Task Time

Input

Shuffle Read

Shuffle Write

Logs

Thread Dump

1

xxx81440xxx:51081

0

0.0 B / 530.3 MB

0.0 B

0

0

8159

8159

3.0 m

0.0 B

392.0 B

206.8 KB

stdout
stderr

Thread Dump

2

xxx81441xxx:34114

0

0.0 B / 530.3 MB

0.0 B

1

0

35

36

5.1 s

0.0 B

364.0 B

700.0 B

stdout
stderr

Thread Dump

<driver>

xxx81436xxx:38325

0

0.0 B / 265.4 MB

0.0 B

0

0

0

0

0 ms

0.0 B

0.0 B

0.0 B

Thread Dump







From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance

What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:

Hi,

I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine  than on our cluster.
I submit it to yarn on our cloudera cluster.

But my first question is more detailed:

In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.

For example:
1678

map at LogonAnalysis.scala:215+details

2015/07/09 09:17:00

5 s

50/50

173.5 KB


But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?


Details for Stage 1678
•         Total task time across all tasks: 0.8 s
•         Shuffle write: 173.5 KB / 2031
 Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

14 ms

14 ms

15 ms

15 ms

24 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

2.6 KB / 28

3.1 KB / 35

3.5 KB / 42

3.9 KB / 46

4.4 KB / 53

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxxx:44231

1.0 s

50

0

50

173.5 KB / 2031










Re: spark streaming performance

Posted by Tathagata Das <td...@databricks.com>.
I think I know what is going on. What is the storage level that you are
using for your receivers? Are you using replication in the storage level?

Whats going on here is the following. There is 1 task with process_level
locality that is immediately launched as soon as the stage is submitted.
After that the somehow the system is not able to launch any process local
tasks, so has to wait for the spark.locality.wait = 3 seconds, to allow the
lower levels of locality, which is rack local. Then it launches rest of 49
tasks, which finish within 1 second in total. Overall, its taking 3 second
wait + 1 second + something = approx 5 seconds.

Increasing the number of cores per container will definitely increase
performance.

TD

On Thu, Jul 9, 2015 at 2:09 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:

>  What do you mean with “full task list of all the 50 tasks”?
>
>
>
> This?
>  Details for Stage 271
>
> ·         *Total task time across all tasks:* 0.7 s
>
> ·         *Shuffle write:* 1377.0 B / 51
>
>  *Show additional metrics*
>  Summary Metrics for 51 Completed Tasks
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 13 ms
>
> 13 ms
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>    Aggregated Metrics by Executor
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> bfravicsvr81441-cld.opentsp.com:47790
>
> 0.9 s
>
> 51
>
> 0
>
> 51
>
> 1377.0 B / 51
>    Tasks
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 50
>
> 3549
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:10
>
> 13 ms
>
> 27.0 B / 1
>
> 0
>
> 3550
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 15 ms
>
> 27.0 B / 1
>
> 1
>
> 3551
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 2
>
> 3552
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 3
>
> 3553
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 4
>
> 3554
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 13 ms
>
> 27.0 B / 1
>
> 5
>
> 3555
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 13 ms
>
> 27.0 B / 1
>
> 6
>
> 3556
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 14 ms
>
> 27.0 B / 1
>
> 7
>
> 3557
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:13
>
> 15 ms
>
> 27.0 B / 1
>
> 8
>
> 3558
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 9
>
> 3559
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 10
>
> 3560
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 11
>
> 3561
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 12
>
> 3562
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 13
>
> 3563
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 14
>
> 3564
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 15
>
> 3565
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 16
>
> 3566
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 17
>
> 3567
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 18
>
> 3568
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 19
>
> 3569
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 20
>
> 3570
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 21
>
> 3571
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 22
>
> 3572
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 23
>
> 3573
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 24
>
> 3574
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 25
>
> 3575
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 26
>
> 3576
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 27
>
> 3577
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 28
>
> 3578
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 29
>
> 3579
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 30
>
> 3580
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 31
>
> 3581
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 32
>
> 3582
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 33
>
> 3583
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 34
>
> 3584
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 35
>
> 3585
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 36
>
> 3586
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 37
>
> 3587
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 38
>
> 3588
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 39
>
> 3589
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 40
>
> 3590
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 41
>
> 3591
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 42
>
> 3592
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 13 ms
>
> 27.0 B / 1
>
> 43
>
> 3593
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
> 44
>
> 3594
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 45
>
> 3595
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 46
>
> 3596
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 47
>
> 3597
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 48
>
> 3598
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 14 ms
>
> 27.0 B / 1
>
> 49
>
> 3599
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 2 / bfravicsvr81441-cld.opentsp.com
>
> 2015/07/09 11:09:14
>
> 15 ms
>
> 27.0 B / 1
>
>
>
>
>
> *From:* Tathagata Das [mailto:tdas@databricks.com]
> *Sent:* donderdag 9 juli 2015 10:28
> *To:* Michel Hubert
> *Cc:* user@spark.apache.org
> *Subject:* Re: spark streaming performance
>
>
>
> Can you provide the full task list of all the 50 tasks? I am curious to
> find out whats going on.
>
>
>
> And spark.executor.cores needs to be set. Or --executor-cores in
> spark-submit
>
>
>
>
>
> On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>
> wrote:
>
>
>
>
>
> Hi,
>
>
>
> Thanks for you reply!
>
>
>
>
>
> *@What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second. *
>
>
>
> What metric/page shows the number of cores in the executor? Should I
> specify “spark.executor.cores” when submitting the application?
>
>
>
>
>
>
>
> @*Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?*
>
>
>
> Stage 704 seems to be submitted 4 sec later. What could be the reason
> other than stage 703 finished in 4 to 5 sec?
>
>
>
>
>
> I’ve copy pasted some more detailed info (masked out some servernames
> xxxNUMBERxxx).
> Completed Stages (530)
>
> *Stage Id*
>
> *Description*
>
> *Submitted*
>
> *Duration*
>
> *Tasks: Succeeded/Total*
>
> *Input*
>
> *Output*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> 706
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 14 ms
>
> 1/1
>
> 704
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 74 ms
>
> 1/1
>
> 1377.0 B
>
> 703
>
> count at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>
> +details
>
> 2015/07/09 09:56:20
>
> 5 s
>
> 51/51
>
> 1377.0 B
>
>
>
>
>
>
>
> *Details for Stage 703*
>
> ·         *Total task time across all tasks: *0.6 s
>
> ·         *Shuffle write: *1377.0 B / 51
>
> *Show additional metrics*
>
> *Summary Metrics for 51 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 11 ms
>
> 12 ms
>
> 12 ms
>
> 13 ms
>
> 23 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 1
>
> xxx81440xxx:51081
>
> 0.8 s
>
> 51
>
> 0
>
> 51
>
> 1377.0 B / 51
>
> *Tasks*
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 50
>
> 9308
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:20
>
> 11 ms
>
> 27.0 B / 1
>
> 0
>
> 9309
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 1
>
> 9310
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 2
>
> 9311
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 3
>
> 9312
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 4
>
> 9313
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 5
>
> 9314
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 11 ms
>
> 27.0 B / 1
>
> 6
>
> 9315
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 7
>
> 9316
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 8
>
> 9317
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 9
>
> 9318
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 10
>
> 9319
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 11
>
> 9320
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 12
>
> 9321
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 13
>
> 9322
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 14
>
> 9323
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 15
>
> 9324
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 16
>
> 9325
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 17
>
> 9326
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 18
>
> 9327
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 19
>
> 9328
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 20
>
> 9329
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 21
>
> 9330
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 22
>
> 9331
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 23
>
> 9332
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 24
>
> 9333
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 25
>
> 9334
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 26
>
> 9335
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 27
>
> 9336
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 28
>
> 9337
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 29
>
> 9338
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 30
>
> 9339
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 31
>
> 9340
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 32
>
> 9341
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 33
>
> 9342
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 34
>
> 9343
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 35
>
> 9344
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 36
>
> 9345
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 37
>
> 9346
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 38
>
> 9347
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 39
>
> 9348
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 23 ms
>
> 27.0 B / 1
>
> 40
>
> 9349
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 41
>
> 9350
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 42
>
> 9351
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 43
>
> 9352
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 44
>
> 9353
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 45
>
> 9354
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 46
>
> 9355
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 47
>
> 9356
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 48
>
> 9357
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 49
>
> 9358
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
>
>
>
>
> *Streaming*
>
> ·         *Started at: *Thu Jul 09 09:27:01 CEST 2015
>
> ·         *Time since start: *24 minutes
>
> ·         *Network receivers: *1
>
> ·         *Batch interval: *10 seconds
>
> ·         *Processed batches: *143
>
> ·         *Waiting batches: *1
>
> ·         *Received records: *295016
>
> ·         *Processed records: *292937
>
>
>
>  *Statistics over last 100 processed batches*
>
> *Receiver Statistics*
>
> ·         *Receiver*
>
> ·         *Status*
>
> ·         *Location*
>
> ·         *Records in last batch*
>
> ·         *[2015/07/09 09:51:01]*
>
> ·         *Minimum rate*
>
> ·         *[records/sec]*
>
> ·         *Median rate*
>
> ·         *[records/sec]*
>
> ·         *Maximum rate*
>
> ·         *[records/sec]*
>
> ·         *Last Error*
>
> SocketReceiver-0
>
> ACTIVE
>
> xxx81441xx
>
> 2.1 K
>
> 196
>
> 205
>
> 218
>
> -
>
> *Batch Processing Statistics*
>
> *Metric*
>
> *Last batch*
>
> *Minimum*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Maximum*
>
> Processing Time
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 736 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
> Scheduling Delay
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 1 ms
>
> Total Delay
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 737 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
>
>
>
>
>
>
> *Executors (3)*
>
> ·         *Memory:* 0.0 B Used (1326.0 MB Total)
>
> ·         *Disk:* 0.0 B Used
>
> *Executor ID*
>
> *Address*
>
> *RDD Blocks*
>
> *Memory Used*
>
> *Disk Used*
>
> *Active Tasks*
>
> *Failed Tasks*
>
> *Complete Tasks*
>
> *Total Tasks*
>
> *Task Time*
>
> *Input*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> *Logs*
>
> *Thread Dump*
>
> 1
>
> xxx81440xxx:51081
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 8159
>
> 8159
>
> 3.0 m
>
> 0.0 B
>
> 392.0 B
>
> 206.8 KB
>
> stdout
>
> stderr
>
> Thread Dump
>
> 2
>
> xxx81441xxx:34114
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 1
>
> 0
>
> 35
>
> 36
>
> 5.1 s
>
> 0.0 B
>
> 364.0 B
>
> 700.0 B
>
> stdout
>
> stderr
>
> Thread Dump
>
> <driver>
>
> xxx81436xxx:38325
>
> 0
>
> 0.0 B / 265.4 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0 ms
>
> 0.0 B
>
> 0.0 B
>
> 0.0 B
>
> Thread Dump
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Tathagata Das [mailto:tdas@databricks.com]
> *Sent:* donderdag 9 juli 2015 09:38
> *To:* Michel Hubert
> *Cc:* user@spark.apache.org
> *Subject:* Re: spark streaming performance
>
>
>
> What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second.
>
> Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?
>
>
>
> On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>
> wrote:
>
>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine  than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> ·         *Total task time across all tasks: *0.8 s
>
> ·         *Shuffle write: *173.5 KB / 2031
>
>  *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: spark streaming performance

Posted by Michel Hubert <mi...@vsnsystemen.nl>.
What do you mean with “full task list of all the 50 tasks”?

This?
Details for Stage 271
·         Total task time across all tasks: 0.7 s
·         Shuffle write: 1377.0 B / 51
 Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

13 ms

13 ms

14 ms

14 ms

15 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

bfravicsvr81441-cld.opentsp.com:47790

0.9 s

51

0

51

1377.0 B / 51

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

50

3549

0

SUCCESS

PROCESS_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:10

13 ms

27.0 B / 1

0

3550

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

15 ms

27.0 B / 1

1

3551

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

14 ms

27.0 B / 1

2

3552

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

14 ms

27.0 B / 1

3

3553

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

14 ms

27.0 B / 1

4

3554

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

13 ms

27.0 B / 1

5

3555

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

13 ms

27.0 B / 1

6

3556

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

14 ms

27.0 B / 1

7

3557

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:13

15 ms

27.0 B / 1

8

3558

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

9

3559

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

10

3560

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

11

3561

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

12

3562

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

13

3563

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

15 ms

27.0 B / 1

14

3564

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

15

3565

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

16

3566

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

17

3567

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

18

3568

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

19

3569

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

20

3570

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

21

3571

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

22

3572

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

23

3573

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

24

3574

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

25

3575

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

26

3576

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

27

3577

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

28

3578

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

15 ms

27.0 B / 1

29

3579

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

30

3580

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

31

3581

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

32

3582

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

33

3583

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

15 ms

27.0 B / 1

34

3584

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

35

3585

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

36

3586

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

37

3587

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

38

3588

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

39

3589

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

40

3590

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

41

3591

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

42

3592

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

13 ms

27.0 B / 1

43

3593

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

15 ms

27.0 B / 1

44

3594

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

45

3595

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

46

3596

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

47

3597

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

48

3598

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

14 ms

27.0 B / 1

49

3599

0

SUCCESS

RACK_LOCAL

2 / bfravicsvr81441-cld.opentsp.com

2015/07/09 11:09:14

15 ms

27.0 B / 1



From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 10:28
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance

Can you provide the full task list of all the 50 tasks? I am curious to find out whats going on.

And spark.executor.cores needs to be set. Or --executor-cores in spark-submit


On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:


Hi,

Thanks for you reply!


@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.

What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?



@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?


I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id

Description

Submitted

Duration

Tasks: Succeeded/Total

Input

Output

Shuffle Read

Shuffle Write

706

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details

2015/07/09 09:56:24

14 ms

1/1

704

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details

2015/07/09 09:56:24

74 ms

1/1

1377.0 B

703

count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details

2015/07/09 09:56:20

5 s

51/51

1377.0 B




Details for Stage 703
•         Total task time across all tasks: 0.6 s
•         Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

11 ms

12 ms

12 ms

13 ms

23 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

1

xxx81440xxx:51081

0.8 s

51

0

51

1377.0 B / 51

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

50

9308

0

SUCCESS

PROCESS_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:20

11 ms

27.0 B / 1

0

9309

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

1

9310

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

2

9311

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

3

9312

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

4

9313

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

5

9314

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

11 ms

27.0 B / 1

6

9315

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

7

9316

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

8

9317

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

9

9318

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

10

9319

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

11

9320

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

12

9321

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

13

9322

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

14

9323

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

15

9324

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

16

9325

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

17

9326

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

18

9327

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

19

9328

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

20

9329

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

21

9330

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

22

9331

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

23

9332

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

24

9333

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

25

9334

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

26

9335

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

27

9336

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

28

9337

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

29

9338

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

30

9339

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

31

9340

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

32

9341

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

33

9342

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

34

9343

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

35

9344

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

36

9345

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

37

9346

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

38

9347

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

39

9348

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

23 ms

27.0 B / 1

40

9349

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

41

9350

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

42

9351

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

43

9352

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

44

9353

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

45

9354

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

46

9355

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

47

9356

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

48

9357

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

49

9358

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1



Streaming
•         Started at: Thu Jul 09 09:27:01 CEST 2015
•         Time since start: 24 minutes
•         Network receivers: 1
•         Batch interval: 10 seconds
•         Processed batches: 143
•         Waiting batches: 1
•         Received records: 295016
•         Processed records: 292937


Statistics over last 100 processed batches
Receiver Statistics
•         Receiver

•         Status

•         Location

•         Records in last batch
•         [2015/07/09 09:51:01]

•         Minimum rate
•         [records/sec]

•         Median rate
•         [records/sec]

•         Maximum rate
•         [records/sec]

•         Last Error

SocketReceiver-0

ACTIVE

xxx81441xx

2.1 K

196

205

218

-

Batch Processing Statistics
Metric

Last batch

Minimum

25th percentile

Median

75th percentile

Maximum

Processing Time

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 736 ms

4 seconds 775 ms

4 seconds 926 ms

Scheduling Delay

0 ms

0 ms

0 ms

0 ms

0 ms

1 ms

Total Delay

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 737 ms

4 seconds 775 ms

4 seconds 926 ms




Executors (3)
•         Memory: 0.0 B Used (1326.0 MB Total)
•         Disk: 0.0 B Used
Executor ID

Address

RDD Blocks

Memory Used

Disk Used

Active Tasks

Failed Tasks

Complete Tasks

Total Tasks

Task Time

Input

Shuffle Read

Shuffle Write

Logs

Thread Dump

1

xxx81440xxx:51081

0

0.0 B / 530.3 MB

0.0 B

0

0

8159

8159

3.0 m

0.0 B

392.0 B

206.8 KB

stdout
stderr

Thread Dump

2

xxx81441xxx:34114

0

0.0 B / 530.3 MB

0.0 B

1

0

35

36

5.1 s

0.0 B

364.0 B

700.0 B

stdout
stderr

Thread Dump

<driver>

xxx81436xxx:38325

0

0.0 B / 265.4 MB

0.0 B

0

0

0

0

0 ms

0.0 B

0.0 B

0.0 B

Thread Dump







From: Tathagata Das [mailto:tdas@databricks.com<ma...@databricks.com>]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: spark streaming performance

What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:

Hi,

I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine  than on our cluster.
I submit it to yarn on our cloudera cluster.

But my first question is more detailed:

In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.

For example:
1678

map at LogonAnalysis.scala:215+details

2015/07/09 09:17:00

5 s

50/50

173.5 KB


But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?


Details for Stage 1678
•         Total task time across all tasks: 0.8 s
•         Shuffle write: 173.5 KB / 2031
 Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

14 ms

14 ms

15 ms

15 ms

24 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

2.6 KB / 28

3.1 KB / 35

3.5 KB / 42

3.9 KB / 46

4.4 KB / 53

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxxx:44231

1.0 s

50

0

50

173.5 KB / 2031









Re: spark streaming performance

Posted by Tathagata Das <td...@databricks.com>.
Can you provide the full task list of all the 50 tasks? I am curious to
find out whats going on.

And spark.executor.cores needs to be set. Or --executor-cores in
spark-submit


On Thu, Jul 9, 2015 at 1:05 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:

>
>
>
>
> Hi,
>
>
>
> Thanks for you reply!
>
>
>
>
>
> *@What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second. *
>
>
>
> What metric/page shows the number of cores in the executor? Should I
> specify “spark.executor.cores” when submitting the application?
>
>
>
>
>
>
>
> @*Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?*
>
>
>
> Stage 704 seems to be submitted 4 sec later. What could be the reason
> other than stage 703 finished in 4 to 5 sec?
>
>
>
>
>
> I’ve copy pasted some more detailed info (masked out some servernames
> xxxNUMBERxxx).
>  Completed Stages (530)
>
> *Stage Id*
>
> *Description*
>
> *Submitted*
>
> *Duration*
>
> *Tasks: Succeeded/Total*
>
> *Input*
>
> *Output*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> 706
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 14 ms
>
> 1/1
>
> 704
>
> print at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>
> +details
>
> 2015/07/09 09:56:24
>
> 74 ms
>
> 1/1
>
> 1377.0 B
>
> 703
>
> count at LogonAnalysis.scala:217
> <http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>
> +details
>
> 2015/07/09 09:56:20
>
> 5 s
>
> 51/51
>
> 1377.0 B
>
>
>
>
>
>
>
> *Details for Stage 703*
>
> ·         *Total task time across all tasks: *0.6 s
>
> ·         *Shuffle write: *1377.0 B / 51
>
> *Show additional metrics*
>
> *Summary Metrics for 51 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 11 ms
>
> 12 ms
>
> 12 ms
>
> 13 ms
>
> 23 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> 27.0 B / 1
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 1
>
> xxx81440xxx:51081
>
> 0.8 s
>
> 51
>
> 0
>
> 51
>
> 1377.0 B / 51
>
> *Tasks*
>
> *Index*
>
> *ID*
>
> *Attempt*
>
> *Status*
>
> *Locality Level*
>
> *Executor ID / Host*
>
> *Launch Time*
>
> *Duration*
>
> *GC Time*
>
> *Write Time*
>
> *Shuffle Write Size / Records*
>
> *Errors*
>
> 50
>
> 9308
>
> 0
>
> SUCCESS
>
> PROCESS_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:20
>
> 11 ms
>
> 27.0 B / 1
>
> 0
>
> 9309
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 1
>
> 9310
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 2
>
> 9311
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 3
>
> 9312
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 4
>
> 9313
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 5
>
> 9314
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 11 ms
>
> 27.0 B / 1
>
> 6
>
> 9315
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 7
>
> 9316
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 8
>
> 9317
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 13 ms
>
> 27.0 B / 1
>
> 9
>
> 9318
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 10
>
> 9319
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 11
>
> 9320
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 14 ms
>
> 27.0 B / 1
>
> 12
>
> 9321
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:23
>
> 12 ms
>
> 27.0 B / 1
>
> 13
>
> 9322
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 14
>
> 9323
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 15
>
> 9324
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 16
>
> 9325
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 17
>
> 9326
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 18
>
> 9327
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 19
>
> 9328
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 20
>
> 9329
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 21
>
> 9330
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 22
>
> 9331
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 23
>
> 9332
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 24
>
> 9333
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 25
>
> 9334
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 26
>
> 9335
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 27
>
> 9336
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 28
>
> 9337
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 29
>
> 9338
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 30
>
> 9339
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 31
>
> 9340
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 32
>
> 9341
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 33
>
> 9342
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 34
>
> 9343
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 35
>
> 9344
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 36
>
> 9345
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 37
>
> 9346
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 38
>
> 9347
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 39
>
> 9348
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 23 ms
>
> 27.0 B / 1
>
> 40
>
> 9349
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 41
>
> 9350
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 42
>
> 9351
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 43
>
> 9352
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 11 ms
>
> 27.0 B / 1
>
> 44
>
> 9353
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
> 45
>
> 9354
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 46
>
> 9355
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 47
>
> 9356
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 48
>
> 9357
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 12 ms
>
> 27.0 B / 1
>
> 49
>
> 9358
>
> 0
>
> SUCCESS
>
> RACK_LOCAL
>
> 1 / xxx81440xxx
>
> 2015/07/09 09:56:24
>
> 13 ms
>
> 27.0 B / 1
>
>
>
>
>
> *Streaming*
>
> ·         *Started at: *Thu Jul 09 09:27:01 CEST 2015
>
> ·         *Time since start: *24 minutes
>
> ·         *Network receivers: *1
>
> ·         *Batch interval: *10 seconds
>
> ·         *Processed batches: *143
>
> ·         *Waiting batches: *1
>
> ·         *Received records: *295016
>
> ·         *Processed records: *292937
>
>
>
>
>  *Statistics over last 100 processed batches*
>
> *Receiver Statistics*
>
> ·         *Receiver*
>
> ·         *Status*
>
> ·         *Location*
>
> ·         *Records in last batch*
>
> ·         *[2015/07/09 09:51:01]*
>
> ·         *Minimum rate*
>
> ·         *[records/sec]*
>
> ·         *Median rate*
>
> ·         *[records/sec]*
>
> ·         *Maximum rate*
>
> ·         *[records/sec]*
>
> ·         *Last Error*
>
> SocketReceiver-0
>
> ACTIVE
>
> xxx81441xx
>
> 2.1 K
>
> 196
>
> 205
>
> 218
>
> -
>
> *Batch Processing Statistics*
>
> *Metric*
>
> *Last batch*
>
> *Minimum*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Maximum*
>
> Processing Time
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 736 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
> Scheduling Delay
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 1 ms
>
> Total Delay
>
> 4 seconds 679 ms
>
> 4 seconds 653 ms
>
> 4 seconds 702 ms
>
> 4 seconds 737 ms
>
> 4 seconds 775 ms
>
> 4 seconds 926 ms
>
>
>
>
>
>
>
> *Executors (3)*
>
> ·         *Memory:* 0.0 B Used (1326.0 MB Total)
>
> ·         *Disk:* 0.0 B Used
>
> *Executor ID*
>
> *Address*
>
> *RDD Blocks*
>
> *Memory Used*
>
> *Disk Used*
>
> *Active Tasks*
>
> *Failed Tasks*
>
> *Complete Tasks*
>
> *Total Tasks*
>
> *Task Time*
>
> *Input*
>
> *Shuffle Read*
>
> *Shuffle Write*
>
> *Logs*
>
> *Thread Dump*
>
> 1
>
> xxx81440xxx:51081
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 8159
>
> 8159
>
> 3.0 m
>
> 0.0 B
>
> 392.0 B
>
> 206.8 KB
>
> stdout
>
> stderr
>
> Thread Dump
>
> 2
>
> xxx81441xxx:34114
>
> 0
>
> 0.0 B / 530.3 MB
>
> 0.0 B
>
> 1
>
> 0
>
> 35
>
> 36
>
> 5.1 s
>
> 0.0 B
>
> 364.0 B
>
> 700.0 B
>
> stdout
>
> stderr
>
> Thread Dump
>
> <driver>
>
> xxx81436xxx:38325
>
> 0
>
> 0.0 B / 265.4 MB
>
> 0.0 B
>
> 0
>
> 0
>
> 0
>
> 0
>
> 0 ms
>
> 0.0 B
>
> 0.0 B
>
> 0.0 B
>
> Thread Dump
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Tathagata Das [mailto:tdas@databricks.com]
> *Sent:* donderdag 9 juli 2015 09:38
> *To:* Michel Hubert
> *Cc:* user@spark.apache.org
> *Subject:* Re: spark streaming performance
>
>
>
> What were the number of cores in the executor? It could be that you had
> only one core in the executor which did all the 50 tasks serially so 50
> task X 15 ms = ~ 1 second.
>
> Could you take a look at the task details in the stage page to see when
> the tasks were added to see whether it explains the 5 second?
>
>
>
> On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>
> wrote:
>
>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine  than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> ·         *Total task time across all tasks: *0.8 s
>
> ·         *Shuffle write: *173.5 KB / 2031
>
>  *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: spark streaming performance

Posted by Michel Hubert <mi...@vsnsystemen.nl>.

Hi,

Thanks for you reply!


@What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.

What metric/page shows the number of cores in the executor? Should I specify “spark.executor.cores” when submitting the application?



@Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

Stage 704 seems to be submitted 4 sec later. What could be the reason other than stage 703 finished in 4 to 5 sec?


I’ve copy pasted some more detailed info (masked out some servernames xxxNUMBERxxx).
Completed Stages (530)
Stage Id

Description

Submitted

Duration

Tasks: Succeeded/Total

Input

Output

Shuffle Read

Shuffle Write

706

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=706&attempt=0>+details

2015/07/09 09:56:24

14 ms

1/1

704

print at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=704&attempt=0>+details

2015/07/09 09:56:24

74 ms

1/1

1377.0 B

703

count at LogonAnalysis.scala:217<http://bfravicsvr81434.opentsp.com:8088/proxy/application_1432720182377_0443/stages/stage?id=703&attempt=0>+details

2015/07/09 09:56:20

5 s

51/51

1377.0 B




Details for Stage 703
·         Total task time across all tasks: 0.6 s
·         Shuffle write: 1377.0 B / 51
Show additional metrics
Summary Metrics for 51 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

11 ms

12 ms

12 ms

13 ms

23 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

27.0 B / 1

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

1

xxx81440xxx:51081

0.8 s

51

0

51

1377.0 B / 51

Tasks
Index

ID

Attempt

Status

Locality Level

Executor ID / Host

Launch Time

Duration

GC Time

Write Time

Shuffle Write Size / Records

Errors

50

9308

0

SUCCESS

PROCESS_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:20

11 ms

27.0 B / 1

0

9309

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

1

9310

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

2

9311

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

3

9312

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

4

9313

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

5

9314

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

11 ms

27.0 B / 1

6

9315

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

7

9316

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

8

9317

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

13 ms

27.0 B / 1

9

9318

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

10

9319

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

11

9320

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

14 ms

27.0 B / 1

12

9321

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:23

12 ms

27.0 B / 1

13

9322

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

14

9323

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

15

9324

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

16

9325

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

17

9326

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

18

9327

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

19

9328

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

20

9329

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

21

9330

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

22

9331

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

23

9332

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

24

9333

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

25

9334

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

26

9335

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

27

9336

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

28

9337

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

29

9338

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

30

9339

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

31

9340

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

32

9341

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

33

9342

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

34

9343

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

35

9344

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

36

9345

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

37

9346

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

38

9347

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

39

9348

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

23 ms

27.0 B / 1

40

9349

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

41

9350

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

42

9351

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

43

9352

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

11 ms

27.0 B / 1

44

9353

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1

45

9354

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

46

9355

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

47

9356

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

48

9357

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

12 ms

27.0 B / 1

49

9358

0

SUCCESS

RACK_LOCAL

1 / xxx81440xxx

2015/07/09 09:56:24

13 ms

27.0 B / 1



Streaming
·         Started at: Thu Jul 09 09:27:01 CEST 2015
·         Time since start: 24 minutes
·         Network receivers: 1
·         Batch interval: 10 seconds
·         Processed batches: 143
·         Waiting batches: 1
·         Received records: 295016
·         Processed records: 292937



Statistics over last 100 processed batches
Receiver Statistics
·         Receiver

·         Status

·         Location

·         Records in last batch
·         [2015/07/09 09:51:01]

·         Minimum rate
·         [records/sec]

·         Median rate
·         [records/sec]

·         Maximum rate
·         [records/sec]

·         Last Error

SocketReceiver-0

ACTIVE

xxx81441xx

2.1 K

196

205

218

-

Batch Processing Statistics
Metric

Last batch

Minimum

25th percentile

Median

75th percentile

Maximum

Processing Time

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 736 ms

4 seconds 775 ms

4 seconds 926 ms

Scheduling Delay

0 ms

0 ms

0 ms

0 ms

0 ms

1 ms

Total Delay

4 seconds 679 ms

4 seconds 653 ms

4 seconds 702 ms

4 seconds 737 ms

4 seconds 775 ms

4 seconds 926 ms




Executors (3)
·         Memory: 0.0 B Used (1326.0 MB Total)
·         Disk: 0.0 B Used
Executor ID

Address

RDD Blocks

Memory Used

Disk Used

Active Tasks

Failed Tasks

Complete Tasks

Total Tasks

Task Time

Input

Shuffle Read

Shuffle Write

Logs

Thread Dump

1

xxx81440xxx:51081

0

0.0 B / 530.3 MB

0.0 B

0

0

8159

8159

3.0 m

0.0 B

392.0 B

206.8 KB

stdout
stderr

Thread Dump

2

xxx81441xxx:34114

0

0.0 B / 530.3 MB

0.0 B

1

0

35

36

5.1 s

0.0 B

364.0 B

700.0 B

stdout
stderr

Thread Dump

<driver>

xxx81436xxx:38325

0

0.0 B / 265.4 MB

0.0 B

0

0

0

0

0 ms

0.0 B

0.0 B

0.0 B

Thread Dump







From: Tathagata Das [mailto:tdas@databricks.com]
Sent: donderdag 9 juli 2015 09:38
To: Michel Hubert
Cc: user@spark.apache.org
Subject: Re: spark streaming performance

What were the number of cores in the executor? It could be that you had only one core in the executor which did all the 50 tasks serially so 50 task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the tasks were added to see whether it explains the 5 second?

On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>> wrote:

Hi,

I’ve developed a POC Spark Streaming application.
But it seems to perform better on my development machine  than on our cluster.
I submit it to yarn on our cloudera cluster.

But my first question is more detailed:

In de application UI (:4040) I see in the streaming section that the batch processing took 6 sec.
Then when I look at the stages I indeed see a stage with duration 5s.

For example:
1678

map at LogonAnalysis.scala:215+details

2015/07/09 09:17:00

5 s

50/50

173.5 KB


But when I look into the details of state 1678 it tells me the duration was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
What is responsible for the gap between 14 ms, 1s and 5 sec?


Details for Stage 1678
•         Total task time across all tasks: 0.8 s
•         Shuffle write: 173.5 KB / 2031
 Show additional metrics
Summary Metrics for 50 Completed Tasks
Metric

Min

25th percentile

Median

75th percentile

Max

Duration

14 ms

14 ms

15 ms

15 ms

24 ms

GC Time

0 ms

0 ms

0 ms

0 ms

0 ms

Shuffle Write Size / Records

2.6 KB / 28

3.1 KB / 35

3.5 KB / 42

3.9 KB / 46

4.4 KB / 53

Aggregated Metrics by Executor
Executor ID

Address

Task Time

Total Tasks

Failed Tasks

Succeeded Tasks

Shuffle Write Size / Records

2

xxxx:44231

1.0 s

50

0

50

173.5 KB / 2031








Re: spark streaming performance

Posted by Tathagata Das <td...@databricks.com>.
What were the number of cores in the executor? It could be that you had
only one core in the executor which did all the 50 tasks serially so 50
task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the
tasks were added to see whether it explains the 5 second?

On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mi...@vsnsystemen.nl>
wrote:

>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine  than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> ·         *Total task time across all tasks: *0.8 s
>
> ·         *Shuffle write: *173.5 KB / 2031
>
>  *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>