You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/10/12 00:19:21 UTC
Writing to multiple directories in hadoop
Hi,
I am trying to separate my output from reducer to different folders..
My dirver has the following code:
FileOutputFormat.setOutputPath(job, new Path(output));
//MultipleOutputs.addNamedOutput(job, namedOutput,
outputFormatClass, keyClass, valueClass)
//MultipleOutputs.addNamedOutput(job, namedOutput,
outputFormatClass, keyClass, valueClass)
MultipleOutputs.addNamedOutput(job, "foo",
TextOutputFormat.class, NullWritable.class, Text.class);
MultipleOutputs.addNamedOutput(job, "bar",
TextOutputFormat.class, Text.class,NullWritable.class);
MultipleOutputs.addNamedOutput(job, "foobar",
TextOutputFormat.class, Text.class, NullWritable.class);
And then my reducer has the following code:
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());
But in the output, I see:
output/foo-r-0001
output/foo-r-0002
output/foobar-r-0001
output/bar-r-0001
But what I am trying is :
output/foo/part-r-0001
output/foo/part-r-0002
output/bar/part-r-0001
output/foobar/part-r-0001
How do I do this?
Thanks
Re: Writing to multiple directories in hadoop
Posted by Sonal Goyal <so...@gmail.com>.
Hi Jamal,
If I remember correctly, you can use the write(key, value, basePath) method
of MultipleOutput in your reducer to get different directories.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)
Here is what the API says
Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:
private MultipleOutputs out;
public void setup(Context context) {
out = new MultipleOutputs(context);
...
}
public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
for (Text t : values) {
out.write(key, t, generateFileName(<*parameter list...*>));
}
}
protected void cleanup(Context context) throws IOException,
InterruptedException {
out.close();
}
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.
private String generateFileName(Text k) {
// expect Text k in format "Surname|Forename"
String[] kStr = k.toString().split("\\|");
String sName = kStr[0];
String fName = kStr[1];
// example for k = Smith|John
// output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
return sName + "/" + fName;
}
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
> FileOutputFormat.setOutputPath(job, new Path(output));
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
> MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
> MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>
Re: Writing to multiple directories in hadoop
Posted by Sonal Goyal <so...@gmail.com>.
Hi Jamal,
If I remember correctly, you can use the write(key, value, basePath) method
of MultipleOutput in your reducer to get different directories.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)
Here is what the API says
Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:
private MultipleOutputs out;
public void setup(Context context) {
out = new MultipleOutputs(context);
...
}
public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
for (Text t : values) {
out.write(key, t, generateFileName(<*parameter list...*>));
}
}
protected void cleanup(Context context) throws IOException,
InterruptedException {
out.close();
}
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.
private String generateFileName(Text k) {
// expect Text k in format "Surname|Forename"
String[] kStr = k.toString().split("\\|");
String sName = kStr[0];
String fName = kStr[1];
// example for k = Smith|John
// output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
return sName + "/" + fName;
}
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
> FileOutputFormat.setOutputPath(job, new Path(output));
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
> MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
> MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>
Re: Writing to multiple directories in hadoop
Posted by Sonal Goyal <so...@gmail.com>.
Hi Jamal,
If I remember correctly, you can use the write(key, value, basePath) method
of MultipleOutput in your reducer to get different directories.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)
Here is what the API says
Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:
private MultipleOutputs out;
public void setup(Context context) {
out = new MultipleOutputs(context);
...
}
public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
for (Text t : values) {
out.write(key, t, generateFileName(<*parameter list...*>));
}
}
protected void cleanup(Context context) throws IOException,
InterruptedException {
out.close();
}
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.
private String generateFileName(Text k) {
// expect Text k in format "Surname|Forename"
String[] kStr = k.toString().split("\\|");
String sName = kStr[0];
String fName = kStr[1];
// example for k = Smith|John
// output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
return sName + "/" + fName;
}
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
> FileOutputFormat.setOutputPath(job, new Path(output));
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
> MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
> MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>
Re: Writing to multiple directories in hadoop
Posted by Sonal Goyal <so...@gmail.com>.
Hi Jamal,
If I remember correctly, you can use the write(key, value, basePath) method
of MultipleOutput in your reducer to get different directories.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)
Here is what the API says
Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:
private MultipleOutputs out;
public void setup(Context context) {
out = new MultipleOutputs(context);
...
}
public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
for (Text t : values) {
out.write(key, t, generateFileName(<*parameter list...*>));
}
}
protected void cleanup(Context context) throws IOException,
InterruptedException {
out.close();
}
Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.
private String generateFileName(Text k) {
// expect Text k in format "Surname|Forename"
String[] kStr = k.toString().split("\\|");
String sName = kStr[0];
String fName = kStr[1];
// example for k = Smith|John
// output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
return sName + "/" + fName;
}
Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>
<http://in.linkedin.com/in/sonalgoyal>
On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
> FileOutputFormat.setOutputPath(job, new Path(output));
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
> MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
> MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
> MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>