You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Answer Agrawal <yr...@gmail.com> on 2015/05/16 16:49:23 UTC

How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Hi,

In xmls configuration file of Hadoop-2.x,
"mapreduce.input.fileinputformat.split.minsize" is given which can be set
but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
I need to set it in my mapreduce code.

Thanks,

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
What do you think is  the type of the property value that you are trying to
write? Is it string? Or numeric? Have you check the documentation of the
Configuration class that I sent earlier?

There are multiple setXXX methods depending on the type of the property
value being set:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#setLong(java.lang.String,
long)


For the other case below, why are you setting the job object (first
parameter) as null?
FileInputFormat.setMaxInputSplitSize(null, 102400);
Check out the documentation here:
http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#setMaxInputSplitSize(org.apache.hadoop.mapreduce.Job,
long)

Lastly,
 conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
VS.
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");
is just a matter of how you are referencing the configuration object.
Either as its own reference of through chained called from the job object.
That is programming style decision and has no bearing on it.

Regards,
Shahab

On Sun, May 17, 2015 at 10:17 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Thanks,
> Is this the correct way to write ?
> conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
> or
> job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
> "102400");
>
> I think another ways as
> FileInputFormat.setMaxInputSplitSize(null, 102400);
>
> Is this all right ? Are these both solve the same purpose or something
> else ?
>
> Thanks,
>
> On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> You can either pass them on as command line argument using -D option.
>> Assuming your job is implementing the standard Tool interface:
>>
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>>
>> Or you can set them in the code using the various 'set' methods to set
>> key/value values in the configuration object.
>>
>> ...
>> Job job = Job.getInstance(getConf());
>> job.setJarByClass(MyJob.class);
>>
>> job.getConfiguration().set("<property-name>",<value>);
>> ....
>>
>> Docs for Configuration class:
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>>
>> This will work as long as the property is not marked final
>>
>> Regards,
>> Shahab
>>
>>
>> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In xmls configuration file of Hadoop-2.x,
>>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>>> I need to set it in my mapreduce code.
>>>
>>> Thanks,
>>>
>>
>>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
What do you think is  the type of the property value that you are trying to
write? Is it string? Or numeric? Have you check the documentation of the
Configuration class that I sent earlier?

There are multiple setXXX methods depending on the type of the property
value being set:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#setLong(java.lang.String,
long)


For the other case below, why are you setting the job object (first
parameter) as null?
FileInputFormat.setMaxInputSplitSize(null, 102400);
Check out the documentation here:
http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#setMaxInputSplitSize(org.apache.hadoop.mapreduce.Job,
long)

Lastly,
 conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
VS.
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");
is just a matter of how you are referencing the configuration object.
Either as its own reference of through chained called from the job object.
That is programming style decision and has no bearing on it.

Regards,
Shahab

On Sun, May 17, 2015 at 10:17 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Thanks,
> Is this the correct way to write ?
> conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
> or
> job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
> "102400");
>
> I think another ways as
> FileInputFormat.setMaxInputSplitSize(null, 102400);
>
> Is this all right ? Are these both solve the same purpose or something
> else ?
>
> Thanks,
>
> On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> You can either pass them on as command line argument using -D option.
>> Assuming your job is implementing the standard Tool interface:
>>
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>>
>> Or you can set them in the code using the various 'set' methods to set
>> key/value values in the configuration object.
>>
>> ...
>> Job job = Job.getInstance(getConf());
>> job.setJarByClass(MyJob.class);
>>
>> job.getConfiguration().set("<property-name>",<value>);
>> ....
>>
>> Docs for Configuration class:
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>>
>> This will work as long as the property is not marked final
>>
>> Regards,
>> Shahab
>>
>>
>> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In xmls configuration file of Hadoop-2.x,
>>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>>> I need to set it in my mapreduce code.
>>>
>>> Thanks,
>>>
>>
>>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
What do you think is  the type of the property value that you are trying to
write? Is it string? Or numeric? Have you check the documentation of the
Configuration class that I sent earlier?

There are multiple setXXX methods depending on the type of the property
value being set:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#setLong(java.lang.String,
long)


For the other case below, why are you setting the job object (first
parameter) as null?
FileInputFormat.setMaxInputSplitSize(null, 102400);
Check out the documentation here:
http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#setMaxInputSplitSize(org.apache.hadoop.mapreduce.Job,
long)

Lastly,
 conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
VS.
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");
is just a matter of how you are referencing the configuration object.
Either as its own reference of through chained called from the job object.
That is programming style decision and has no bearing on it.

Regards,
Shahab

On Sun, May 17, 2015 at 10:17 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Thanks,
> Is this the correct way to write ?
> conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
> or
> job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
> "102400");
>
> I think another ways as
> FileInputFormat.setMaxInputSplitSize(null, 102400);
>
> Is this all right ? Are these both solve the same purpose or something
> else ?
>
> Thanks,
>
> On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> You can either pass them on as command line argument using -D option.
>> Assuming your job is implementing the standard Tool interface:
>>
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>>
>> Or you can set them in the code using the various 'set' methods to set
>> key/value values in the configuration object.
>>
>> ...
>> Job job = Job.getInstance(getConf());
>> job.setJarByClass(MyJob.class);
>>
>> job.getConfiguration().set("<property-name>",<value>);
>> ....
>>
>> Docs for Configuration class:
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>>
>> This will work as long as the property is not marked final
>>
>> Regards,
>> Shahab
>>
>>
>> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In xmls configuration file of Hadoop-2.x,
>>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>>> I need to set it in my mapreduce code.
>>>
>>> Thanks,
>>>
>>
>>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
What do you think is  the type of the property value that you are trying to
write? Is it string? Or numeric? Have you check the documentation of the
Configuration class that I sent earlier?

There are multiple setXXX methods depending on the type of the property
value being set:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#setLong(java.lang.String,
long)


For the other case below, why are you setting the job object (first
parameter) as null?
FileInputFormat.setMaxInputSplitSize(null, 102400);
Check out the documentation here:
http://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#setMaxInputSplitSize(org.apache.hadoop.mapreduce.Job,
long)

Lastly,
 conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
VS.
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");
is just a matter of how you are referencing the configuration object.
Either as its own reference of through chained called from the job object.
That is programming style decision and has no bearing on it.

Regards,
Shahab

On Sun, May 17, 2015 at 10:17 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Thanks,
> Is this the correct way to write ?
> conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
> or
> job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
> "102400");
>
> I think another ways as
> FileInputFormat.setMaxInputSplitSize(null, 102400);
>
> Is this all right ? Are these both solve the same purpose or something
> else ?
>
> Thanks,
>
> On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> You can either pass them on as command line argument using -D option.
>> Assuming your job is implementing the standard Tool interface:
>>
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>>
>> Or you can set them in the code using the various 'set' methods to set
>> key/value values in the configuration object.
>>
>> ...
>> Job job = Job.getInstance(getConf());
>> job.setJarByClass(MyJob.class);
>>
>> job.getConfiguration().set("<property-name>",<value>);
>> ....
>>
>> Docs for Configuration class:
>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>>
>> This will work as long as the property is not marked final
>>
>> Regards,
>> Shahab
>>
>>
>> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In xmls configuration file of Hadoop-2.x,
>>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>>> I need to set it in my mapreduce code.
>>>
>>> Thanks,
>>>
>>
>>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Answer Agrawal <yr...@gmail.com>.
Thanks,
Is this the correct way to write ?
conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
or
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");

I think another ways as
FileInputFormat.setMaxInputSplitSize(null, 102400);

Is this all right ? Are these both solve the same purpose or something else
?

Thanks,

On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> You can either pass them on as command line argument using -D option.
> Assuming your job is implementing the standard Tool interface:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>
> Or you can set them in the code using the various 'set' methods to set
> key/value values in the configuration object.
>
> ...
> Job job = Job.getInstance(getConf());
> job.setJarByClass(MyJob.class);
>
> job.getConfiguration().set("<property-name>",<value>);
> ....
>
> Docs for Configuration class:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>
> This will work as long as the property is not marked final
>
> Regards,
> Shahab
>
>
> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In xmls configuration file of Hadoop-2.x,
>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>> I need to set it in my mapreduce code.
>>
>> Thanks,
>>
>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Answer Agrawal <yr...@gmail.com>.
Thanks,
Is this the correct way to write ?
conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
or
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");

I think another ways as
FileInputFormat.setMaxInputSplitSize(null, 102400);

Is this all right ? Are these both solve the same purpose or something else
?

Thanks,

On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> You can either pass them on as command line argument using -D option.
> Assuming your job is implementing the standard Tool interface:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>
> Or you can set them in the code using the various 'set' methods to set
> key/value values in the configuration object.
>
> ...
> Job job = Job.getInstance(getConf());
> job.setJarByClass(MyJob.class);
>
> job.getConfiguration().set("<property-name>",<value>);
> ....
>
> Docs for Configuration class:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>
> This will work as long as the property is not marked final
>
> Regards,
> Shahab
>
>
> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In xmls configuration file of Hadoop-2.x,
>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>> I need to set it in my mapreduce code.
>>
>> Thanks,
>>
>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Answer Agrawal <yr...@gmail.com>.
Thanks,
Is this the correct way to write ?
conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
or
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");

I think another ways as
FileInputFormat.setMaxInputSplitSize(null, 102400);

Is this all right ? Are these both solve the same purpose or something else
?

Thanks,

On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> You can either pass them on as command line argument using -D option.
> Assuming your job is implementing the standard Tool interface:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>
> Or you can set them in the code using the various 'set' methods to set
> key/value values in the configuration object.
>
> ...
> Job job = Job.getInstance(getConf());
> job.setJarByClass(MyJob.class);
>
> job.getConfiguration().set("<property-name>",<value>);
> ....
>
> Docs for Configuration class:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>
> This will work as long as the property is not marked final
>
> Regards,
> Shahab
>
>
> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In xmls configuration file of Hadoop-2.x,
>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>> I need to set it in my mapreduce code.
>>
>> Thanks,
>>
>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Answer Agrawal <yr...@gmail.com>.
Thanks,
Is this the correct way to write ?
conf.set("mapreduce.input.fileinputformat.split.maxsize", "102400");
or
job.getConfiguration().set("mapreduce.input.fileinputformat.split.maxsize",
"102400");

I think another ways as
FileInputFormat.setMaxInputSplitSize(null, 102400);

Is this all right ? Are these both solve the same purpose or something else
?

Thanks,

On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> You can either pass them on as command line argument using -D option.
> Assuming your job is implementing the standard Tool interface:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
>
> Or you can set them in the code using the various 'set' methods to set
> key/value values in the configuration object.
>
> ...
> Job job = Job.getInstance(getConf());
> job.setJarByClass(MyJob.class);
>
> job.getConfiguration().set("<property-name>",<value>);
> ....
>
> Docs for Configuration class:
> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html
>
> This will work as long as the property is not marked final
>
> Regards,
> Shahab
>
>
> On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> In xmls configuration file of Hadoop-2.x,
>> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
>> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
>> I need to set it in my mapreduce code.
>>
>> Thanks,
>>
>
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
You can either pass them on as command line argument using -D option.
Assuming your job is implementing the standard Tool interface:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html

Or you can set them in the code using the various 'set' methods to set
key/value values in the configuration object.

...
Job job = Job.getInstance(getConf());
job.setJarByClass(MyJob.class);

job.getConfiguration().set("<property-name>",<value>);
....

Docs for Configuration class:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html

This will work as long as the property is not marked final

Regards,
Shahab


On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Hi,
>
> In xmls configuration file of Hadoop-2.x,
> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
> I need to set it in my mapreduce code.
>
> Thanks,
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
You can either pass them on as command line argument using -D option.
Assuming your job is implementing the standard Tool interface:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html

Or you can set them in the code using the various 'set' methods to set
key/value values in the configuration object.

...
Job job = Job.getInstance(getConf());
job.setJarByClass(MyJob.class);

job.getConfiguration().set("<property-name>",<value>);
....

Docs for Configuration class:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html

This will work as long as the property is not marked final

Regards,
Shahab


On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Hi,
>
> In xmls configuration file of Hadoop-2.x,
> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
> I need to set it in my mapreduce code.
>
> Thanks,
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
You can either pass them on as command line argument using -D option.
Assuming your job is implementing the standard Tool interface:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html

Or you can set them in the code using the various 'set' methods to set
key/value values in the configuration object.

...
Job job = Job.getInstance(getConf());
job.setJarByClass(MyJob.class);

job.getConfiguration().set("<property-name>",<value>);
....

Docs for Configuration class:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html

This will work as long as the property is not marked final

Regards,
Shahab


On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Hi,
>
> In xmls configuration file of Hadoop-2.x,
> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
> I need to set it in my mapreduce code.
>
> Thanks,
>

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

Posted by Shahab Yunus <sh...@gmail.com>.
You can either pass them on as command line argument using -D option.
Assuming your job is implementing the standard Tool interface:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html

Or you can set them in the code using the various 'set' methods to set
key/value values in the configuration object.

...
Job job = Job.getInstance(getConf());
job.setJarByClass(MyJob.class);

job.getConfiguration().set("<property-name>",<value>);
....

Docs for Configuration class:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html

This will work as long as the property is not marked final

Regards,
Shahab


On Sat, May 16, 2015 at 10:49 AM, Answer Agrawal <yr...@gmail.com>
wrote:

> Hi,
>
> In xmls configuration file of Hadoop-2.x,
> "mapreduce.input.fileinputformat.split.minsize" is given which can be set
> but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file.
> I need to set it in my mapreduce code.
>
> Thanks,
>