You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Frank Luo <jl...@merkleinc.com> on 2013/04/24 16:02:25 UTC

how to limit mappers for a hive job

I am trying to query a huge file with 370 blocks, but it errors out with message of "number of mappers exceeds limit" and my cluster has a "mapred.tasktracker.map.tasks.maximum" set to 50.

I have tried to set parameters such as  hive.exec.mappers.max/ mapred.tasktracker.tasks/ apred.tasktracker.map.tasks.maximum through beeswax and seems none of them is effective.

I can change "mapred.tasktracker.map.tasks.maximum" and the query can go through, but I really want to limit concurrent number of tasks per job.

So any suggestions please? I am running cloudera 4.5.

Re: how to limit mappers for a hive job

Posted by Edward Capriolo <ed...@gmail.com>.
Also make sure hive is using CombinedHiveInputFormat (not just
HiveInputFormat). Combined is the default for newer versions.


On Wed, Apr 24, 2013 at 10:51 AM, Sanjay Subramanian <
Sanjay.Subramanian@wizecommerce.com> wrote:

>  I use the following
>
>  To specify the Mapper Input Split Size (134217728 is in bytes)
> ==============================================================
> SET mapreduce.input.fileinputformat.split.maxsize=134217728;
>
>   From: Frank Luo <jl...@merkleinc.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Wednesday, April 24, 2013 7:02 AM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: how to limit mappers for a hive job
>
>   I am trying to query a huge file with 370 blocks, but it errors out
> with message of “number of mappers exceeds limit” and my cluster has a “mapred.tasktracker.map.tasks.maximum”
> set to 50.
>
>
>
> I have tried to set parameters such as  hive.exec.mappers.max/mapred.tasktracker.tasks/ apred.tasktracker.map.tasks.maximum
> through beeswax and seems none of them is effective.
>
>
>
> I can change “mapred.tasktracker.map.tasks.maximum” and the query can go
> through, but I really want to limit concurrent number of tasks per job.
>
>
>
> So any suggestions please? I am running cloudera 4.5.
>
> CONFIDENTIALITY NOTICE
> ======================
> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>

Re: how to limit mappers for a hive job

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
I use the following

To specify the Mapper Input Split Size (134217728 is in bytes)
==============================================================
SET mapreduce.input.fileinputformat.split.maxsize=134217728;

From: Frank Luo <jl...@merkleinc.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Wednesday, April 24, 2013 7:02 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: how to limit mappers for a hive job

I am trying to query a huge file with 370 blocks, but it errors out with message of “number of mappers exceeds limit” and my cluster has a “mapred.tasktracker.map.tasks.maximum” set to 50.

I have tried to set parameters such as  hive.exec.mappers.max/ mapred.tasktracker.tasks/ apred.tasktracker.map.tasks.maximum through beeswax and seems none of them is effective.

I can change “mapred.tasktracker.map.tasks.maximum” and the query can go through, but I really want to limit concurrent number of tasks per job.

So any suggestions please? I am running cloudera 4.5.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.