You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/10/12 00:19:21 UTC

Writing to multiple directories in hadoop

Hi,

I am trying to separate my output from reducer to different folders..

My dirver has the following code:
 FileOutputFormat.setOutputPath(job, new Path(output));
            //MultipleOutputs.addNamedOutput(job, namedOutput,
outputFormatClass, keyClass, valueClass)
            //MultipleOutputs.addNamedOutput(job, namedOutput,
outputFormatClass, keyClass, valueClass)
            MultipleOutputs.addNamedOutput(job, "foo",
TextOutputFormat.class, NullWritable.class, Text.class);
            MultipleOutputs.addNamedOutput(job, "bar",
TextOutputFormat.class, Text.class,NullWritable.class);
            MultipleOutputs.addNamedOutput(job, "foobar",
TextOutputFormat.class, Text.class, NullWritable.class);

And then my reducer has the following code:
mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
mos.write("bar", key,NullWritable.get());
mos.write("foobar", key,NullWritable.get());

But in the output, I see:

output/foo-r-0001
output/foo-r-0002
output/foobar-r-0001
output/bar-r-0001


But what I am trying is :

output/foo/part-r-0001
output/foo/part-r-0002
output/bar/part-r-0001
output/foobar/part-r-0001

How do I do this?
Thanks

Re: Writing to multiple directories in hadoop

Posted by Sonal Goyal <so...@gmail.com>.

Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
 }

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));
   }
 }

 protected void cleanup(Context context) throws IOException,
InterruptedException {
   out.close();
 }


Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
 }


Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>

Re: Writing to multiple directories in hadoop

Posted by Sonal Goyal <so...@gmail.com>.

Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
 }

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));
   }
 }

 protected void cleanup(Context context) throws IOException,
InterruptedException {
   out.close();
 }


Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
 }


Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>

Re: Writing to multiple directories in hadoop

Posted by Sonal Goyal <so...@gmail.com>.

Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
 }

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));
   }
 }

 protected void cleanup(Context context) throws IOException,
InterruptedException {
   out.close();
 }


Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
 }


Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>

Re: Writing to multiple directories in hadoop

Posted by Sonal Goyal <so...@gmail.com>.

Hi Jamal,

If I remember correctly, you can use the write(key, value, basePath) method
 of MultipleOutput in your reducer to get different directories.

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(KEYOUT,
VALUEOUT, java.lang.String)

Here is what the API says

Use MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath) to
write key and value to a path specified by baseOutputPath, with no need to
specify a named output:

 private MultipleOutputs out;

 public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
 }

 public void reduce(Text key, Iterable values, Context context) throws
IOException, InterruptedException {
 for (Text t : values) {
   out.write(key, t, generateFileName(<*parameter list...*>));
   }
 }

 protected void cleanup(Context context) throws IOException,
InterruptedException {
   out.close();
 }


Use your own code in generateFileName() to create a custom path to your
results. '/' characters in baseOutputPath will be translated into directory
levels in your file system. Also, append your custom-generated path with
"part" or similar, otherwise your output will be -00000, -00001 etc. No
call to context.write() is necessary. See example generateFileName() code
below.

 private String generateFileName(Text k) {
   // expect Text k in format "Surname|Forename"
   String[] kStr = k.toString().split("\\|");

   String sName = kStr[0];
   String fName = kStr[1];

   // example for k = Smith|John
   // output written to /user/hadoop/path/to/output/Smith/John-r-00000 (etc)
   return sName + "/" + fName;
 }


Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Sat, Oct 12, 2013 at 3:49 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>
> I am trying to separate my output from reducer to different folders..
>
> My dirver has the following code:
>  FileOutputFormat.setOutputPath(job, new Path(output));
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             //MultipleOutputs.addNamedOutput(job, namedOutput,
> outputFormatClass, keyClass, valueClass)
>             MultipleOutputs.addNamedOutput(job, "foo",
> TextOutputFormat.class, NullWritable.class, Text.class);
>             MultipleOutputs.addNamedOutput(job, "bar",
> TextOutputFormat.class, Text.class,NullWritable.class);
>             MultipleOutputs.addNamedOutput(job, "foobar",
> TextOutputFormat.class, Text.class, NullWritable.class);
>
> And then my reducer has the following code:
> mos.write("foo",NullWritable.get(),new Text(jsn.toString()));
> mos.write("bar", key,NullWritable.get());
> mos.write("foobar", key,NullWritable.get());
>
> But in the output, I see:
>
> output/foo-r-0001
> output/foo-r-0002
> output/foobar-r-0001
> output/bar-r-0001
>
>
> But what I am trying is :
>
> output/foo/part-r-0001
> output/foo/part-r-0002
> output/bar/part-r-0001
> output/foobar/part-r-0001
>
> How do I do this?
> Thanks
>