You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Decimus Phostle <de...@gmail.com> on 2011/08/08 18:06:42 UTC

Using MultipleTextOutputFormat to control output filename in MapReduce

Hello Folks,

I needed some help with using MultipleTextOutputFormat to control the
output filename in MapReduce.

Currently I am using it as shown below(or at
http://pastebin.com/gJxkdwRd). And it seems to work fine. However what
I am trying to change is the usage of the fields that get picked to
determine the filename.

Instead of hardcoding them to field[0] or field[3](as is the case in
the sample), I would like to pick this up (in some dynamic fashion)
from say, JobConf as field[jobConf.get("id.offset")] or
field[jobConf[get("date.offset")]. Does anyone here know how I could
go about doing this (or something to this effect i.e. it doesn't have
to be JobConf per se)?

Any pointers/suggestions/tips et al. would be most appreciated. Thanks.

PS:

1) Current usage of MultipleTextOutputFormat:

public class FooBarMultipleTextOutputFormat
extends MultipleTextOutputFormat<NullWritable, Text> {
	
	protected String generateFileNameForKeyValue(NullWritable key,
										 Text value,
										 String name) {
		String line = value.toString();

		//TODO: I would like to parameterize the field that is picked
		//here. Something akin to using JobConf in Mapper.
		//i.e. instead of hard-coding [3] or [0], I would like
		//to get it from JobConf(or some other configuration) in some fashion

		String date = (line.split("\t"))[3].substring(0,10);
		String id = (line.split("\t"))[0];

		String partitionNumber = String.format("%05d", ID.getPartitionNumber(id));

		return date + "/pn_" + partitionNumber;
	}
}

2) This is also an SO question here: http://goo.gl/FX7QN, if you would
rather answer it there. TIA.

Re: Using MultipleTextOutputFormat to control output filename in MapReduce

Posted by Decimus Phostle <de...@gmail.com>.
This was answered on SO(at http://goo.gl/FX7QN) - replying here in
case someone has a similar question at a future date.

On Mon, Aug 8, 2011 at 12:06 PM, Decimus Phostle
<de...@gmail.com> wrote:
> Hello Folks,
>
> I needed some help with using MultipleTextOutputFormat to control the
> output filename in MapReduce.
>
> Currently I am using it as shown below(or at
> http://pastebin.com/gJxkdwRd). And it seems to work fine. However what
> I am trying to change is the usage of the fields that get picked to
> determine the filename.
>
> Instead of hardcoding them to field[0] or field[3](as is the case in
> the sample), I would like to pick this up (in some dynamic fashion)
> from say, JobConf as field[jobConf.get("id.offset")] or
> field[jobConf[get("date.offset")]. Does anyone here know how I could
> go about doing this (or something to this effect i.e. it doesn't have
> to be JobConf per se)?
>
> Any pointers/suggestions/tips et al. would be most appreciated. Thanks.
>
> PS:
>
> 1) Current usage of MultipleTextOutputFormat:
>
> public class FooBarMultipleTextOutputFormat
> extends MultipleTextOutputFormat<NullWritable, Text> {
>
>        protected String generateFileNameForKeyValue(NullWritable key,
>                                                                                 Text value,
>                                                                                 String name) {
>                String line = value.toString();
>
>                //TODO: I would like to parameterize the field that is picked
>                //here. Something akin to using JobConf in Mapper.
>                //i.e. instead of hard-coding [3] or [0], I would like
>                //to get it from JobConf(or some other configuration) in some fashion
>
>                String date = (line.split("\t"))[3].substring(0,10);
>                String id = (line.split("\t"))[0];
>
>                String partitionNumber = String.format("%05d", ID.getPartitionNumber(id));
>
>                return date + "/pn_" + partitionNumber;
>        }
> }
>
> 2) This is also an SO question here: http://goo.gl/FX7QN, if you would
> rather answer it there. TIA.
>