You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by randylu <ra...@gmail.com> on 2015/08/14 05:01:29 UTC
Always two tasks slower than others, and then job fails
It is strange that there are always two tasks slower than others, and the
corresponding partitions's data are larger, no matter how many partitions?
Executor ID Address Task Time Shuffle Read Size /
Records
1 slave129.vsvs.com:56691 16 s 1 99.5 MB / 18865432
*10 slave317.vsvs.com:59281 0 ms 0 413.5 MB / 311001318*
100 slave290.vsvs.com:60241 19 s 1 110.8 MB / 27075926
101 slave323.vsvs.com:36246 14 s 1 126.1 MB / 25052808
Task time and records of Executor 10 seems strange, and the cpus on the
node are all 100% busy.
Anyone meets the same problem, Thanks in advance for any answer!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Always two tasks slower than others, and then job fails
Posted by Zoltán Zvara <zo...@gmail.com>.
Data skew is still a problem with Spark.
- If you use groupByKey, try to express your logic by not using groupByKey.
- If you need to use groupByKey, all you can do is to scale vertically.
- If you can, repartition with a finer HashPartitioner. You will have many
tasks for each stage, but tasks are light-weight in Spark, so it should not
introduce a heavy overhead. If you have your own domain-partitioner, try to
rewrite it by introducing a secondary-key.
I hope I gave some insights and help.
On Fri, Aug 14, 2015 at 9:37 AM Jeff Zhang <zj...@gmail.com> wrote:
> Data skew ? May your partition key has some special value like null or
> empty string
>
> On Fri, Aug 14, 2015 at 11:01 AM, randylu <ra...@gmail.com> wrote:
>
>> It is strange that there are always two tasks slower than others, and
>> the
>> corresponding partitions's data are larger, no matter how many partitions?
>>
>>
>> Executor ID Address Task Time Shuffle Read Size
>> /
>> Records
>> 1 slave129.vsvs.com:56691 16 s 1 99.5 MB / 18865432
>> *10 slave317.vsvs.com:59281 0 ms 0 413.5 MB / 311001318*
>> 100 slave290.vsvs.com:60241 19 s 1 110.8 MB / 27075926
>> 101 slave323.vsvs.com:36246 14 s 1 126.1 MB / 25052808
>>
>> Task time and records of Executor 10 seems strange, and the cpus on the
>> node are all 100% busy.
>>
>> Anyone meets the same problem, Thanks in advance for any answer!
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
Re: Always two tasks slower than others, and then job fails
Posted by Jeff Zhang <zj...@gmail.com>.
Data skew ? May your partition key has some special value like null or
empty string
On Fri, Aug 14, 2015 at 11:01 AM, randylu <ra...@gmail.com> wrote:
> It is strange that there are always two tasks slower than others, and the
> corresponding partitions's data are larger, no matter how many partitions?
>
>
> Executor ID Address Task Time Shuffle Read Size /
> Records
> 1 slave129.vsvs.com:56691 16 s 1 99.5 MB / 18865432
> *10 slave317.vsvs.com:59281 0 ms 0 413.5 MB / 311001318*
> 100 slave290.vsvs.com:60241 19 s 1 110.8 MB / 27075926
> 101 slave323.vsvs.com:36246 14 s 1 126.1 MB / 25052808
>
> Task time and records of Executor 10 seems strange, and the cpus on the
> node are all 100% busy.
>
> Anyone meets the same problem, Thanks in advance for any answer!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Always-two-tasks-slower-than-others-and-then-job-fails-tp24257.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
--
Best Regards
Jeff Zhang