You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Priya Ch <le...@gmail.com> on 2016/05/26 14:40:52 UTC

Spark Job Execution halts during shuffle...

Hello Team,


 I am trying to perform join 2 rdds where one is of size 800 MB and the
other is 190 MB. During the join step, my job halts and I don't see
progress in the execution.

This is the message I see on console -

INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
locations for shuffle 0 to <hostname1>:40000
INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
locations for shuffle 1 to <hostname2>:40000

After these messages, I dont see any progress. I am using Spark 1.6.0
version and yarn scheduler (running in YARN client mode). My cluster
configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
1 TB hard disk space, 300GB memory and 32 cores.

HDFS block size is 128 MB.

Thanks,
Padma Ch

Re: Spark Job Execution halts during shuffle...

Posted by Priya Ch <le...@gmail.com>.

Hi
Can someone throw light on this. The issue is not frquently happening.
Sometimes the job halts with the above messages.

Regards,
Padma Ch

On Fri, May 27, 2016 at 8:47 AM, Ted Yu <yu...@gmail.com> wrote:

> Priya:
> Have you checked the executor logs on hostname1 and hostname2 ?
>
> Cheers
>
> On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro <li...@gmail.com>
> wrote:
>
>> Hi,
>>
>> If you get stuck in job fails, one of best practices is to increase
>> #partitions.
>> Also, you'd better off using DataFrame instread of RDD in terms of join
>> optimization.
>>
>> // maropu
>>
>>
>> On Thu, May 26, 2016 at 11:40 PM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>>> Hello Team,
>>>
>>>
>>>  I am trying to perform join 2 rdds where one is of size 800 MB and the
>>> other is 190 MB. During the join step, my job halts and I don't see
>>> progress in the execution.
>>>
>>> This is the message I see on console -
>>>
>>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>>> locations for shuffle 0 to <hostname1>:40000
>>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>>> locations for shuffle 1 to <hostname2>:40000
>>>
>>> After these messages, I dont see any progress. I am using Spark 1.6.0
>>> version and yarn scheduler (running in YARN client mode). My cluster
>>> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
>>> 1 TB hard disk space, 300GB memory and 32 cores.
>>>
>>> HDFS block size is 128 MB.
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>

Re: Spark Job Execution halts during shuffle...

Posted by Ted Yu <yu...@gmail.com>.

Priya:
Have you checked the executor logs on hostname1 and hostname2 ?

Cheers

On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro <li...@gmail.com>
wrote:

> Hi,
>
> If you get stuck in job fails, one of best practices is to increase
> #partitions.
> Also, you'd better off using DataFrame instread of RDD in terms of join
> optimization.
>
> // maropu
>
>
> On Thu, May 26, 2016 at 11:40 PM, Priya Ch <le...@gmail.com>
> wrote:
>
>> Hello Team,
>>
>>
>>  I am trying to perform join 2 rdds where one is of size 800 MB and the
>> other is 190 MB. During the join step, my job halts and I don't see
>> progress in the execution.
>>
>> This is the message I see on console -
>>
>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>> locations for shuffle 0 to <hostname1>:40000
>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>> locations for shuffle 1 to <hostname2>:40000
>>
>> After these messages, I dont see any progress. I am using Spark 1.6.0
>> version and yarn scheduler (running in YARN client mode). My cluster
>> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
>> 1 TB hard disk space, 300GB memory and 32 cores.
>>
>> HDFS block size is 128 MB.
>>
>> Thanks,
>> Padma Ch
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Spark Job Execution halts during shuffle...

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi,

If you get stuck in job fails, one of best practices is to increase
#partitions.
Also, you'd better off using DataFrame instread of RDD in terms of join
optimization.

// maropu


On Thu, May 26, 2016 at 11:40 PM, Priya Ch <le...@gmail.com>
wrote:

> Hello Team,
>
>
>  I am trying to perform join 2 rdds where one is of size 800 MB and the
> other is 190 MB. During the join step, my job halts and I don't see
> progress in the execution.
>
> This is the message I see on console -
>
> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
> locations for shuffle 0 to <hostname1>:40000
> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
> locations for shuffle 1 to <hostname2>:40000
>
> After these messages, I dont see any progress. I am using Spark 1.6.0
> version and yarn scheduler (running in YARN client mode). My cluster
> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
> 1 TB hard disk space, 300GB memory and 32 cores.
>
> HDFS block size is 128 MB.
>
> Thanks,
> Padma Ch
>



-- 
---
Takeshi Yamamuro