You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sandeep das <ya...@gmail.com> on 2016/01/05 12:09:17 UTC
Increasing input size to MAP tasks
Hi All,
I've a pig script which runs over YARN. Each MAP task created by this pig
script is taking 128MB as input and not more than that.
I want to increase the input size of each map job. I've read that input
size is determined using following formula:
max(min split size, min(block size, max split size)).
Following are the values I'm setting for these parameters:
dfs.blocksize = 134217728
mapreduce.input.fileinputformat.split.maxsize = 1610612736
mapreduce.input.fileinputformat.split.minsize = 805306368
mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
According the values configured the input size should be 805306368 but it
is still 134217728 which same as dfs.blocksize.
But every time I change my dfs.blocksize to higher value the input to MAP
tasks increase by the same amount.
Following is the setup:
Cloudera : 5.5.1
Hadoop: 2.6.0
Pig: 0.12.0
Regards,
Sandeep
Re: Increasing input size to MAP tasks
Posted by sandeep das <ya...@gmail.com>.
Hi All,
You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.
Regards,
Sandeep
On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:
> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>
Re: Increasing input size to MAP tasks
Posted by sandeep das <ya...@gmail.com>.
Hi All,
You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.
Regards,
Sandeep
On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:
> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>
Re: Increasing input size to MAP tasks
Posted by sandeep das <ya...@gmail.com>.
Hi All,
You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.
Regards,
Sandeep
On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:
> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>
Re: Increasing input size to MAP tasks
Posted by sandeep das <ya...@gmail.com>.
Hi All,
You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.
Regards,
Sandeep
On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <ya...@gmail.com> wrote:
> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>