You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Decimus Phostle <de...@gmail.com> on 2011/08/08 18:06:42 UTC
Using MultipleTextOutputFormat to control output filename in MapReduce
Hello Folks,
I needed some help with using MultipleTextOutputFormat to control the
output filename in MapReduce.
Currently I am using it as shown below(or at
http://pastebin.com/gJxkdwRd). And it seems to work fine. However what
I am trying to change is the usage of the fields that get picked to
determine the filename.
Instead of hardcoding them to field[0] or field[3](as is the case in
the sample), I would like to pick this up (in some dynamic fashion)
from say, JobConf as field[jobConf.get("id.offset")] or
field[jobConf[get("date.offset")]. Does anyone here know how I could
go about doing this (or something to this effect i.e. it doesn't have
to be JobConf per se)?
Any pointers/suggestions/tips et al. would be most appreciated. Thanks.
PS:
1) Current usage of MultipleTextOutputFormat:
public class FooBarMultipleTextOutputFormat
extends MultipleTextOutputFormat<NullWritable, Text> {
protected String generateFileNameForKeyValue(NullWritable key,
Text value,
String name) {
String line = value.toString();
//TODO: I would like to parameterize the field that is picked
//here. Something akin to using JobConf in Mapper.
//i.e. instead of hard-coding [3] or [0], I would like
//to get it from JobConf(or some other configuration) in some fashion
String date = (line.split("\t"))[3].substring(0,10);
String id = (line.split("\t"))[0];
String partitionNumber = String.format("%05d", ID.getPartitionNumber(id));
return date + "/pn_" + partitionNumber;
}
}
2) This is also an SO question here: http://goo.gl/FX7QN, if you would
rather answer it there. TIA.
Re: Using MultipleTextOutputFormat to control output filename in MapReduce
Posted by Decimus Phostle <de...@gmail.com>.
This was answered on SO(at http://goo.gl/FX7QN) - replying here in
case someone has a similar question at a future date.
On Mon, Aug 8, 2011 at 12:06 PM, Decimus Phostle
<de...@gmail.com> wrote:
> Hello Folks,
>
> I needed some help with using MultipleTextOutputFormat to control the
> output filename in MapReduce.
>
> Currently I am using it as shown below(or at
> http://pastebin.com/gJxkdwRd). And it seems to work fine. However what
> I am trying to change is the usage of the fields that get picked to
> determine the filename.
>
> Instead of hardcoding them to field[0] or field[3](as is the case in
> the sample), I would like to pick this up (in some dynamic fashion)
> from say, JobConf as field[jobConf.get("id.offset")] or
> field[jobConf[get("date.offset")]. Does anyone here know how I could
> go about doing this (or something to this effect i.e. it doesn't have
> to be JobConf per se)?
>
> Any pointers/suggestions/tips et al. would be most appreciated. Thanks.
>
> PS:
>
> 1) Current usage of MultipleTextOutputFormat:
>
> public class FooBarMultipleTextOutputFormat
> extends MultipleTextOutputFormat<NullWritable, Text> {
>
> protected String generateFileNameForKeyValue(NullWritable key,
> Text value,
> String name) {
> String line = value.toString();
>
> //TODO: I would like to parameterize the field that is picked
> //here. Something akin to using JobConf in Mapper.
> //i.e. instead of hard-coding [3] or [0], I would like
> //to get it from JobConf(or some other configuration) in some fashion
>
> String date = (line.split("\t"))[3].substring(0,10);
> String id = (line.split("\t"))[0];
>
> String partitionNumber = String.format("%05d", ID.getPartitionNumber(id));
>
> return date + "/pn_" + partitionNumber;
> }
> }
>
> 2) This is also an SO question here: http://goo.gl/FX7QN, if you would
> rather answer it there. TIA.
>