You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by randylu <ra...@gmail.com> on 2015/08/14 05:01:29 UTC

Always two tasks slower than others, and then job fails

  It is strange that there are always two tasks slower than others, and the
corresponding partitions's data are larger, no matter how many partitions?


Executor ID     Address	                Task Time	Shuffle Read Size /
Records
1	slave129.vsvs.com:56691	16 s	1	99.5 MB / 18865432
*10	slave317.vsvs.com:59281	0 ms	0	413.5 MB / 311001318*
100	slave290.vsvs.com:60241	19 s	1	110.8 MB / 27075926
101	slave323.vsvs.com:36246	14 s	1	126.1 MB / 25052808

  Task time and records of Executor 10 seems strange, and the cpus on the
node are all 100% busy.

  Anyone meets the same problem,  Thanks in advance for any answer!




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Always two tasks slower than others, and then job fails

Posted by Zoltán Zvara <zo...@gmail.com>.

Data skew is still a problem with Spark.

- If you use groupByKey, try to express your logic by not using groupByKey.
- If you need to use groupByKey, all you can do is to scale vertically.
- If you can, repartition with a finer HashPartitioner. You will have many
tasks for each stage, but tasks are light-weight in Spark, so it should not
introduce a heavy overhead. If you have your own domain-partitioner, try to
rewrite it by introducing a secondary-key.

I hope I gave some insights and help.

On Fri, Aug 14, 2015 at 9:37 AM Jeff Zhang <zj...@gmail.com> wrote:

> Data skew ? May your partition key has some special value like null or
> empty string
>
> On Fri, Aug 14, 2015 at 11:01 AM, randylu <ra...@gmail.com> wrote:
>
>>   It is strange that there are always two tasks slower than others, and
>> the
>> corresponding partitions's data are larger, no matter how many partitions?
>>
>>
>> Executor ID     Address                 Task Time       Shuffle Read Size
>> /
>> Records
>> 1       slave129.vsvs.com:56691 16 s    1       99.5 MB / 18865432
>> *10     slave317.vsvs.com:59281 0 ms    0       413.5 MB / 311001318*
>> 100     slave290.vsvs.com:60241 19 s    1       110.8 MB / 27075926
>> 101     slave323.vsvs.com:36246 14 s    1       126.1 MB / 25052808
>>
>>   Task time and records of Executor 10 seems strange, and the cpus on the
>> node are all 100% busy.
>>
>>   Anyone meets the same problem,  Thanks in advance for any answer!
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Always two tasks slower than others, and then job fails

Posted by Jeff Zhang <zj...@gmail.com>.

Data skew ? May your partition key has some special value like null or
empty string

On Fri, Aug 14, 2015 at 11:01 AM, randylu <ra...@gmail.com> wrote:

>   It is strange that there are always two tasks slower than others, and the
> corresponding partitions's data are larger, no matter how many partitions?
>
>
> Executor ID     Address                 Task Time       Shuffle Read Size /
> Records
> 1       slave129.vsvs.com:56691 16 s    1       99.5 MB / 18865432
> *10     slave317.vsvs.com:59281 0 ms    0       413.5 MB / 311001318*
> 100     slave290.vsvs.com:60241 19 s    1       110.8 MB / 27075926
> 101     slave323.vsvs.com:36246 14 s    1       126.1 MB / 25052808
>
>   Task time and records of Executor 10 seems strange, and the cpus on the
> node are all 100% busy.
>
>   Anyone meets the same problem,  Thanks in advance for any answer!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang