You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by samir das mohapatra <sa...@gmail.com> on 2013/06/03 11:34:54 UTC

How to get the intermediate mapper output file name

Hi all,
   How to get the mapper output filename  inside the  the mapper .

  or

How to change the  mapper ouput file name.
 Default it looks like part-m-00000,part-m-00001 etc.

Regards,
samir.

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
I think the format of the mapper and reducer split files are hard wired
into hadoop code , however you can prepend something in the beginning of
the filename or even a directory using multiple output format.

thanks,
Rahul



On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Serega Sheypak <se...@gmail.com>.
See 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
 
- Case two: This class is used for a map only job. The job wants to use an 
output file name that is either a part of the input file name of the input 
data, or some derivation of it.  -- Case three: This class is used for a 
map only job. The job wants to use an output file name that depends on both 
the keys and the input file name

понедельник, 3 июня 2013 г., 13:34:54 UTC+4 пользователь samir das 
mohapatra написал:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Dino , good to know this.


On Mon, Jun 3, 2013 at 3:12 PM, Dino Kečo <di...@gmail.com> wrote:

> Hi Samir,
>
> File naming is defined in FileOutputFormat class and there is property mapreduce.output.basename
> which you can use to tweak things with file naming.
>
> Please check this code
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat for
> more details (line 272).
>
> HTH
>
> Regards,
>
> Dino Kečo
> mail: dino.keco@gmail.com
> skype: dino.keco
> phone: +387 61 507 851
>
>
> On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
> samir.helpdoc@gmail.com> wrote:
>
>> Hi all,
>>    How to get the mapper output filename  inside the  the mapper .
>>
>>   or
>>
>> How to change the  mapper ouput file name.
>>  Default it looks like part-m-00000,part-m-00001 etc.
>>
>> Regards,
>> samir.
>>
>
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Dino , good to know this.


On Mon, Jun 3, 2013 at 3:12 PM, Dino Kečo <di...@gmail.com> wrote:

> Hi Samir,
>
> File naming is defined in FileOutputFormat class and there is property mapreduce.output.basename
> which you can use to tweak things with file naming.
>
> Please check this code
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat for
> more details (line 272).
>
> HTH
>
> Regards,
>
> Dino Kečo
> mail: dino.keco@gmail.com
> skype: dino.keco
> phone: +387 61 507 851
>
>
> On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
> samir.helpdoc@gmail.com> wrote:
>
>> Hi all,
>>    How to get the mapper output filename  inside the  the mapper .
>>
>>   or
>>
>> How to change the  mapper ouput file name.
>>  Default it looks like part-m-00000,part-m-00001 etc.
>>
>> Regards,
>> samir.
>>
>
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Dino , good to know this.


On Mon, Jun 3, 2013 at 3:12 PM, Dino Kečo <di...@gmail.com> wrote:

> Hi Samir,
>
> File naming is defined in FileOutputFormat class and there is property mapreduce.output.basename
> which you can use to tweak things with file naming.
>
> Please check this code
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat for
> more details (line 272).
>
> HTH
>
> Regards,
>
> Dino Kečo
> mail: dino.keco@gmail.com
> skype: dino.keco
> phone: +387 61 507 851
>
>
> On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
> samir.helpdoc@gmail.com> wrote:
>
>> Hi all,
>>    How to get the mapper output filename  inside the  the mapper .
>>
>>   or
>>
>> How to change the  mapper ouput file name.
>>  Default it looks like part-m-00000,part-m-00001 etc.
>>
>> Regards,
>> samir.
>>
>
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Dino , good to know this.


On Mon, Jun 3, 2013 at 3:12 PM, Dino Kečo <di...@gmail.com> wrote:

> Hi Samir,
>
> File naming is defined in FileOutputFormat class and there is property mapreduce.output.basename
> which you can use to tweak things with file naming.
>
> Please check this code
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat for
> more details (line 272).
>
> HTH
>
> Regards,
>
> Dino Kečo
> mail: dino.keco@gmail.com
> skype: dino.keco
> phone: +387 61 507 851
>
>
> On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
> samir.helpdoc@gmail.com> wrote:
>
>> Hi all,
>>    How to get the mapper output filename  inside the  the mapper .
>>
>>   or
>>
>> How to change the  mapper ouput file name.
>>  Default it looks like part-m-00000,part-m-00001 etc.
>>
>> Regards,
>> samir.
>>
>
>

Re: How to get the intermediate mapper output file name

Posted by Dino Kečo <di...@gmail.com>.
Hi Samir,

File naming is defined in FileOutputFormat class and there is property
mapreduce.output.basename
which you can use to tweak things with file naming.

Please check this code
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat
for
more details (line 272).

HTH

Regards,

Dino Kečo
mail: dino.keco@gmail.com
skype: dino.keco
phone: +387 61 507 851


On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
samir.helpdoc@gmail.com> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by dvohra <dv...@yahoo.com>.

The  part-m-00000,part-m-00001 file names are Hadoop naming conventions. To 
use custom output file names use the MultipleOutputs class. 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html


With MultipleOutputs the file name may be customized as

<namedOutput>_<multiName>-(m|r)-<part-number>


On Monday, June 3, 2013 2:34:54 AM UTC-7, samir das mohapatra wrote:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Raj K Singh <ra...@gmail.com>.
you can use *getInputFileBasedOutputFileName*(JobConf job, String name)
 which   Generate the outfile name based on a given anme and the input file
name.

thanks

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
I think the format of the mapper and reducer split files are hard wired
into hadoop code , however you can prepend something in the beginning of
the filename or even a directory using multiple output format.

thanks,
Rahul



On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Raj K Singh <ra...@gmail.com>.
you can use *getInputFileBasedOutputFileName*(JobConf job, String name)
 which   Generate the outfile name based on a given anme and the input file
name.

thanks

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Raj K Singh <ra...@gmail.com>.
you can use *getInputFileBasedOutputFileName*(JobConf job, String name)
 which   Generate the outfile name based on a given anme and the input file
name.

thanks

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Raj K Singh <ra...@gmail.com>.
you can use *getInputFileBasedOutputFileName*(JobConf job, String name)
 which   Generate the outfile name based on a given anme and the input file
name.

thanks

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Dino Kečo <di...@gmail.com>.
Hi Samir,

File naming is defined in FileOutputFormat class and there is property
mapreduce.output.basename
which you can use to tweak things with file naming.

Please check this code
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat
for
more details (line 272).

HTH

Regards,

Dino Kečo
mail: dino.keco@gmail.com
skype: dino.keco
phone: +387 61 507 851


On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
samir.helpdoc@gmail.com> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by dvohra <dv...@yahoo.com>.

The  part-m-00000,part-m-00001 file names are Hadoop naming conventions. To 
use custom output file names use the MultipleOutputs class. 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html


With MultipleOutputs the file name may be customized as

<namedOutput>_<multiName>-(m|r)-<part-number>


On Monday, June 3, 2013 2:34:54 AM UTC-7, samir das mohapatra wrote:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by dvohra <dv...@yahoo.com>.

The  part-m-00000,part-m-00001 file names are Hadoop naming conventions. To 
use custom output file names use the MultipleOutputs class. 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html


With MultipleOutputs the file name may be customized as

<namedOutput>_<multiName>-(m|r)-<part-number>


On Monday, June 3, 2013 2:34:54 AM UTC-7, samir das mohapatra wrote:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
I think the format of the mapper and reducer split files are hard wired
into hadoop code , however you can prepend something in the beginning of
the filename or even a directory using multiple output format.

thanks,
Rahul



On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Dino Kečo <di...@gmail.com>.
Hi Samir,

File naming is defined in FileOutputFormat class and there is property
mapreduce.output.basename
which you can use to tweak things with file naming.

Please check this code
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat
for
more details (line 272).

HTH

Regards,

Dino Kečo
mail: dino.keco@gmail.com
skype: dino.keco
phone: +387 61 507 851


On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
samir.helpdoc@gmail.com> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Serega Sheypak <se...@gmail.com>.
See 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
 
- Case two: This class is used for a map only job. The job wants to use an 
output file name that is either a part of the input file name of the input 
data, or some derivation of it.  -- Case three: This class is used for a 
map only job. The job wants to use an output file name that depends on both 
the keys and the input file name

понедельник, 3 июня 2013 г., 13:34:54 UTC+4 пользователь samir das 
mohapatra написал:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Dino Kečo <di...@gmail.com>.
Hi Samir,

File naming is defined in FileOutputFormat class and there is property
mapreduce.output.basename
which you can use to tweak things with file naming.

Please check this code
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.java#FileOutputFormat
for
more details (line 272).

HTH

Regards,

Dino Kečo
mail: dino.keco@gmail.com
skype: dino.keco
phone: +387 61 507 851


On Mon, Jun 3, 2013 at 11:34 AM, samir das mohapatra <
samir.helpdoc@gmail.com> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Serega Sheypak <se...@gmail.com>.
See 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
 
- Case two: This class is used for a map only job. The job wants to use an 
output file name that is either a part of the input file name of the input 
data, or some derivation of it.  -- Case three: This class is used for a 
map only job. The job wants to use an output file name that depends on both 
the keys and the input file name

понедельник, 3 июня 2013 г., 13:34:54 UTC+4 пользователь samir das 
mohapatra написал:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by dvohra <dv...@yahoo.com>.

The  part-m-00000,part-m-00001 file names are Hadoop naming conventions. To 
use custom output file names use the MultipleOutputs class. 

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html


With MultipleOutputs the file name may be customized as

<namedOutput>_<multiName>-(m|r)-<part-number>


On Monday, June 3, 2013 2:34:54 AM UTC-7, samir das mohapatra wrote:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
I think the format of the mapper and reducer split files are hard wired
into hadoop code , however you can prepend something in the beginning of
the filename or even a directory using multiple output format.

thanks,
Rahul



On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra <samir.helpdoc@gmail.com
> wrote:

> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>

Re: How to get the intermediate mapper output file name

Posted by Serega Sheypak <se...@gmail.com>.
See 
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html
 
- Case two: This class is used for a map only job. The job wants to use an 
output file name that is either a part of the input file name of the input 
data, or some derivation of it.  -- Case three: This class is used for a 
map only job. The job wants to use an output file name that depends on both 
the keys and the input file name

понедельник, 3 июня 2013 г., 13:34:54 UTC+4 пользователь samir das 
mohapatra написал:
>
> Hi all,
>    How to get the mapper output filename  inside the  the mapper .
>
>   or
>  
> How to change the  mapper ouput file name.
>  Default it looks like part-m-00000,part-m-00001 etc.
>
> Regards,
> samir.
>