You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Yogesh Mahajan <ym...@snappydata.io> on 2016/02/02 05:53:52 UTC

Re: Explaination for info shown in UI

The jobs depend on the number of output operations (print, foreachRDD,
saveAs*Files) and the number of RDD actions in those output operations.

For example:
dstream1.foreachRDD { rdd => rdd.count }    // ONE Spark job per batch
dstream1.foreachRDD { rdd => { rdd.count ; rdd.count } } // TWO Spark jobs
per batch
dstream1.foreachRDD { rdd => rdd.count } ; dstream2.foreachRDD { rdd =>
rdd.count }  // TWO Spark jobs per batch

Regards,
Yogesh Mahajan
SnappyData Inc (snappydata.io)

On Thu, Jan 28, 2016 at 4:30 PM, Sachin Aggarwal <different.sachin@gmail.com
> wrote:

> Hi
>
> I am executing a streaming wordcount with kafka
> with one test topic with 2 partition
> my cluster have three spark executors
>
> Each batch is of 10 sec
>
> for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark
> UI , as shown below below
>
> my questions:-
> 1) As label says jobId for first column, does spark submits 3 jobs for
> each batch ?
> 2) I tried decreasing executers/nodes the job count is also getting
> changed what is the relation with no of  executors?
> 3) only one job actually executes the stage rest two shows skipped why
> other jobs got created?
>
> Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
> stages): Succeeded/Total
> 221 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 46 ms 1/1 (1 skipped)
> 1/1 (3 skipped)
> 220 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped)
> 4/4 (3 skipped)
> 219 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2
> 4/4
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>