You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2017/10/18 17:03:10 UTC

possible cause: same TeraGen job sometimes slow and sometimes fast

I performed a series of TeraGen jobs via spark-submit ( each job generated 
equal size dataset into different S3 buckets )
I noticed that some jobs were fast and some were slow.

Slow jobs always had many log prints like
DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1 
( or 2, etc.. )

Fast jobs always have few prints of those lines.

Can someone explain me, why the number of those debug prints are vary for 
different executions of the same job? The more i see those prints - so the 
job is slower.
Does someone experienced the same behavior?

Thanks
Gil.