You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Sachin Aggarwal <di...@gmail.com> on 2016/01/28 12:00:10 UTC

Explaination for info shown in UI

Hi

I am executing a streaming wordcount with kafka
with one test topic with 2 partition
my cluster have three spark executors

Each batch is of 10 sec

for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark UI
, as shown below below

my questions:-
1) As label says jobId for first column, does spark submits 3 jobs for each
batch ?
2) I tried decreasing executers/nodes the job count is also getting changed
what is the relation with no of  executors?
3) only one job actually executes the stage rest two shows skipped why
other jobs got created?

Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
stages): Succeeded/Total
221 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 46 ms 1/1 (1 skipped)
1/1 (3 skipped)
220 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped)
4/4 (3 skipped)
219 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2
4/4

-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Re: Explaination for info shown in UI

Posted by Yogesh Mahajan <ym...@snappydata.io>.

The jobs depend on the number of output operations (print, foreachRDD,
saveAs*Files) and the number of RDD actions in those output operations.

For example:
dstream1.foreachRDD { rdd => rdd.count }    // ONE Spark job per batch
dstream1.foreachRDD { rdd => { rdd.count ; rdd.count } } // TWO Spark jobs
per batch
dstream1.foreachRDD { rdd => rdd.count } ; dstream2.foreachRDD { rdd =>
rdd.count }  // TWO Spark jobs per batch

Regards,
Yogesh Mahajan
SnappyData Inc (snappydata.io)

On Thu, Jan 28, 2016 at 4:30 PM, Sachin Aggarwal <different.sachin@gmail.com
> wrote:

> Hi
>
> I am executing a streaming wordcount with kafka
> with one test topic with 2 partition
> my cluster have three spark executors
>
> Each batch is of 10 sec
>
> for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark
> UI , as shown below below
>
> my questions:-
> 1) As label says jobId for first column, does spark submits 3 jobs for
> each batch ?
> 2) I tried decreasing executers/nodes the job count is also getting
> changed what is the relation with no of  executors?
> 3) only one job actually executes the stage rest two shows skipped why
> other jobs got created?
>
> Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
> stages): Succeeded/Total
> 221 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 46 ms 1/1 (1 skipped)
> 1/1 (3 skipped)
> 220 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped)
> 4/4 (3 skipped)
> 219 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2
> 4/4
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>