You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sandeep das <ya...@gmail.com> on 2016/01/05 12:09:17 UTC

Increasing input size to MAP tasks

Hi All,

I've a pig script which runs over YARN. Each MAP task created by this pig
script is taking 128MB as input and not more than that.

I want to increase the input size of each map job. I've read that input
size is determined using following formula:

max(min split size, min(block size, max split size)).

Following are the values I'm setting for these parameters:

dfs.blocksize = 134217728
mapreduce.input.fileinputformat.split.maxsize = 1610612736
mapreduce.input.fileinputformat.split.minsize = 805306368
mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112

According the values configured the input size should be 805306368 but it
is still 134217728 which same as dfs.blocksize.

But every time I change my dfs.blocksize to higher value the input to MAP
tasks increase by the same amount.


Following is the setup:
Cloudera : 5.5.1
Hadoop: 2.6.0
Pig: 0.12.0


Regards,
Sandeep

Re: Increasing input size to MAP tasks

Posted by sandeep das <ya...@gmail.com>.
Hi All,

You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.


Regards,
Sandeep

On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:

> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>

Re: Increasing input size to MAP tasks

Posted by sandeep das <ya...@gmail.com>.
Hi All,

You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.


Regards,
Sandeep

On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:

> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>

Re: Increasing input size to MAP tasks

Posted by sandeep das <ya...@gmail.com>.
Hi All,

You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.


Regards,
Sandeep

On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:

> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>

Re: Increasing input size to MAP tasks

Posted by sandeep das <ya...@gmail.com>.
Hi All,

You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.


Regards,
Sandeep

On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:

> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>