You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Srinivas Surasani <hi...@gmail.com> on 2013/04/25 23:33:49 UTC

map tasks are taking ever when running job on 24 TB

Hi,

I'm running hive job on 24TB dataset (on 34560 partitions ). here about 500
to 1000 mappers are getting succeded (total of 80000) and rest mappaers are
taking for ever ( their status stays at 0% all times ).  Is there any
limitations on number of partitions/dataset ? are there any paraemeters to
set  here?

Same job  is suceeding on 18TB (25920 partitions ).

I already set below in my hive query.
set mapreduce.jobtracker.split.metainfo.maxsize=-1;


Regards,
Srinivas

Re: map tasks are taking ever when running job on 24 TB

Posted by Srinivas Surasani <hi...@gmail.com>.

Sanjay, I got job working on 25920 partitions on 18TB. its just failing.

Hi Viral, yes we have enough spcae on cluster and could not find any such
kind of log.




On Thu, Apr 25, 2013 at 3:34 PM, Viral Bajaria <vi...@gmail.com>wrote:

> How about running it via sub-queries where each query runs over a subset
> of the data and has a better chance of finishing. I fear that the amount of
> data to shuffle might be too big and you might be running out of
> scratch/temp space. Did you verify that the job does not fail due to out of
> disk space before the shuffle/reduce can kick in ?
>
> -Viral
>
>
> On Thu, Apr 25, 2013 at 3:10 PM, Sanjay Subramanian <
> Sanjay.Subramanian@wizecommerce.com> wrote:
>
>>  That’s a lot of partitions for one Hive Job ! Not sure if that itself
>> is the root of the issues….There have been quite a few discussions on max
>> 1000-ish number of partitions as good…
>> Is your use case conducive too using Combiners (though they cannot be
>> guaranteed to be called)
>> Thanks
>> sanjay
>>
>> From: Srinivas Surasani <hi...@gmail.com>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
>> Date: Thursday, April 25, 2013 2:33 PM
>> To: "user@hive.apache.org" <us...@hive.apache.org>
>> Subject: map tasks are taking ever when running job on 24 TB
>>
>>
>> Hi,
>>
>> I'm running hive job on 24TB dataset (on 34560 partitions ). here about
>> 500 to 1000 mappers are getting succeded (total of 80000) and rest mappaers
>> are taking for ever ( their status stays at 0% all times ).  Is there any
>> limitations on number of partitions/dataset ? are there any paraemeters to
>> set  here?
>>
>> Same job  is suceeding on 18TB (25920 partitions ).
>>
>> I already set below in my hive query.
>> set mapreduce.jobtracker.split.metainfo.maxsize=-1;
>>
>>
>> Regards,
>> Srinivas
>>
>> CONFIDENTIALITY NOTICE
>> ======================
>> This email message and any attachments are for the exclusive use of the
>> intended recipient(s) and may contain confidential and privileged
>> information. Any unauthorized review, use, disclosure or distribution is
>> prohibited. If you are not the intended recipient, please contact the
>> sender by reply email and destroy all copies of the original message along
>> with any attachments, from your computer system. If you are the intended
>> recipient, please be advised that the content of this message is subject to
>> access, review and disclosure by the sender's Email System Administrator.
>>
>
>


-- 
Regards,
-- Srinivas
Srinivas@cloudwick.com

Re: map tasks are taking ever when running job on 24 TB

Posted by Viral Bajaria <vi...@gmail.com>.

How about running it via sub-queries where each query runs over a subset of
the data and has a better chance of finishing. I fear that the amount of
data to shuffle might be too big and you might be running out of
scratch/temp space. Did you verify that the job does not fail due to out of
disk space before the shuffle/reduce can kick in ?

-Viral


On Thu, Apr 25, 2013 at 3:10 PM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  That’s a lot of partitions for one Hive Job ! Not sure if that itself is
> the root of the issues….There have been quite a few discussions on max
> 1000-ish number of partitions as good…
> Is your use case conducive too using Combiners (though they cannot be
> guaranteed to be called)
> Thanks
> sanjay
>
>   From: Srinivas Surasani <hi...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Thursday, April 25, 2013 2:33 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: map tasks are taking ever when running job on 24 TB
>
>
> Hi,
>
> I'm running hive job on 24TB dataset (on 34560 partitions ). here about
> 500 to 1000 mappers are getting succeded (total of 80000) and rest mappaers
> are taking for ever ( their status stays at 0% all times ).  Is there any
> limitations on number of partitions/dataset ? are there any paraemeters to
> set  here?
>
> Same job  is suceeding on 18TB (25920 partitions ).
>
> I already set below in my hive query.
> set mapreduce.jobtracker.split.metainfo.maxsize=-1;
>
>
> Regards,
> Srinivas
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: map tasks are taking ever when running job on 24 TB

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

That’s a lot of partitions for one Hive Job ! Not sure if that itself is the root of the issues….There have been quite a few discussions on max 1000-ish number of partitions as good…
Is your use case conducive too using Combiners (though they cannot be guaranteed to be called)
Thanks
sanjay

From: Srinivas Surasani <hi...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Thursday, April 25, 2013 2:33 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: map tasks are taking ever when running job on 24 TB

Hi,

I'm running hive job on 24TB dataset (on 34560 partitions ). here about 500 to 1000 mappers are getting succeded (total of 80000) and rest mappaers are taking for ever ( their status stays at 0% all times ).  Is there any limitations on number of partitions/dataset ? are there any paraemeters to set  here?

Same job  is suceeding on 18TB (25920 partitions ).

I already set below in my hive query.
set mapreduce.jobtracker.split.metainfo.maxsize=-1;

Regards,
Srinivas

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.