You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sachin Aggarwal <di...@gmail.com> on 2016/01/28 12:00:10 UTC
Explaination for info shown in UI
Hi
I am executing a streaming wordcount with kafka
with one test topic with 2 partition
my cluster have three spark executors
Each batch is of 10 sec
for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark UI
, as shown below below
my questions:-
1) As label says jobId for first column, does spark submits 3 jobs for each
batch ?
2) I tried decreasing executers/nodes the job count is also getting changed
what is the relation with no of executors?
3) only one job actually executes the stage rest two shows skipped why
other jobs got created?
Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
stages): Succeeded/Total
221 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 46 ms 1/1 (1 skipped)
1/1 (3 skipped)
220 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped)
4/4 (3 skipped)
219 Streaming job from [output operation 0, batch time 02:51:00] print at
StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2
4/4
--
Thanks & Regards
Sachin Aggarwal
7760502772
Re: Explaination for info shown in UI
Posted by Yogesh Mahajan <ym...@snappydata.io>.
The jobs depend on the number of output operations (print, foreachRDD,
saveAs*Files) and the number of RDD actions in those output operations.
For example:
dstream1.foreachRDD { rdd => rdd.count } // ONE Spark job per batch
dstream1.foreachRDD { rdd => { rdd.count ; rdd.count } } // TWO Spark jobs
per batch
dstream1.foreachRDD { rdd => rdd.count } ; dstream2.foreachRDD { rdd =>
rdd.count } // TWO Spark jobs per batch
Regards,
Yogesh Mahajan
SnappyData Inc (snappydata.io)
On Thu, Jan 28, 2016 at 4:30 PM, Sachin Aggarwal <different.sachin@gmail.com
> wrote:
> Hi
>
> I am executing a streaming wordcount with kafka
> with one test topic with 2 partition
> my cluster have three spark executors
>
> Each batch is of 10 sec
>
> for every batch(ex below * batch time 02:51:00*) I see 3 entry in spark
> UI , as shown below below
>
> my questions:-
> 1) As label says jobId for first column, does spark submits 3 jobs for
> each batch ?
> 2) I tried decreasing executers/nodes the job count is also getting
> changed what is the relation with no of executors?
> 3) only one job actually executes the stage rest two shows skipped why
> other jobs got created?
>
> Job IdDescriptionSubmittedDurationStages: Succeeded/TotalTasks (for all
> stages): Succeeded/Total
> 221 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 46 ms 1/1 (1 skipped)
> 1/1 (3 skipped)
> 220 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 47 ms 1/1 (1 skipped)
> 4/4 (3 skipped)
> 219 Streaming job from [output operation 0, batch time 02:51:00] print at
> StreamingWordCount.scala:54 2016/01/28 02:51:00 48 ms 2/2
> 4/4
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>