You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Geoffry Roberts <ge...@gmail.com> on 2009/12/08 21:40:03 UTC

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

All,

This one has me stumped.

What I want to do is output from my reducer multiple files, one for each key
value. I also want to avoid any deprecated parts of the API.

As suggested, I switched from using MultipleTextOutputFormat to
MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
method requires a Reporter as a parameter, but as far as I can tell, the API
doesn't support this.  The only reporter I can find is in the context
object, but is declared protected.

Am I stuck? or just missing something?

My code:

@Override
public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException {
String fileName = key.toString();
             MultipleOutputs.addNamedOutput((JobConf)
context.getConfiguration(), fileName, OutputFormat.class, Text.class,
Text.class);
            mos = new MultipleOutputs((JobConf) context.getConfiguration());
            for (Text line : values) {

// This is the problem line:
                mos.getCollector(fileName, <reporter goes here>).collect(
                        key, line);
            }

            mos.close();
         }

On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com> wrote:

> Geoffry,
>
> The new API comes with a related OF, called MultipleOutputs
> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
> using this instead.
>
> - Aaron
>
>
> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
>> All,
>>
>> What I want to do is output from my reducer multiple files one for each
>> key value.
>>
>> Can this still be done in the current API?
>>
>> It seems that using MultipleTextOutputFormat requires one to use
>> deprecated parts of API.
>>
>> It this correct?
>>
>> I would like to use the class or its equivalent and stay off anything
>> deprecated.
>>
>> Is there a work around?
>>
>> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
>> MultipleTextOutputFormat does not derive from this class.
>>
>> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
>> OutputFormat>);
>>
>>
>> In the Old, deprecated API, one uses JobConf and an implementation of the
>> interface org.apache.hadoop.mapred.OutputFormat.
>> MultipleTextOutputFormat is just such an implementation.
>>
>> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
>> OutputFormat);
>>
>
>

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

Posted by Geoffry Roberts <ge...@gmail.com>.

Amogh,

Thanks for the attachment.  I'll hold on to it.

If I may press you a bit further, I noticed that the directory tree is
different in the distribution I downloaded than the various paths I see in
the the patch.  It is different still in the svn trunk.

What I want is to apply the patch to my hadoop 0.20.1 distribution.  It
doesn't just work because of this directory vs path business.   I suppose I
could hack on the patch but it seems I shouldn't have to.

Why these three differences? release, trunk, patch?  Am I using the wrong
code base?

On Mon, Dec 14, 2009 at 9:30 PM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  Yes. Also attached is an old thread I have kept handy with me. Hope this
> helps you.
>
>
> Thanks,
> Amogh
>
>
> On 12/11/09 10:07 PM, "Geoffry Roberts" <ge...@gmail.com> wrote:
>
> Amogh,
>
> I don't have experience with patches for hadoop.
>
> I take it that I apply this patch using the linux patch utility.
>
> I further assume, I need only apply the latest patch, which is 5.
>
> Am I correct.
>
> On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
>
> http://issues.apache.org/jira/browse/MAPREDUCE-370
>
> You’ll  have to work around for now / try to apply patch.
>
> Amogh
>
>
>
> On 12/9/09 8:54 PM, "Geoffry Roberts" <geoffry.roberts@gmail.com <
> http://geoffry.roberts@gmail.com> > wrote:
>
> Aaron,
>
> I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
>
> lib.output.MultipleOutputs.  I'm using the download page where the tar ball
> is dated from Sep.09.
>
>
>
> Sounds like I need to look at the code repository.
>
>
>
> On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aaron@cloudera.com <
> http://aaron@cloudera.com> > wrote:
>
> Geoffry,
>
> There are two MultipleOutputs implementations; one for the new API, one for
> the old one.
>
> The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does
> not have a getCollector() method. This is intended to work with
> org.apache.hadoop.mapreduce.Mapper and its associated Context object.
>
> The old API implementation of MO
> (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with
> org.apache.hadoop.mapred.Mapper, Reporter, and friends.
>
> If you're going to use the new org.apache.hadoop.mapreduce-based code, you
> should not need to import anything in the mapred package. That having been
> said -- I just realized that the new-API-compatible MultipleOutputs
> implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If
> you're using 0.20, you should probably stick with the old API for your
> process.
>
> Cheers,
> - Aaron
>
>
> On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com <ht...@gmail.com> > wrote:
>
> All,
>
> This one has me stumped.
>
> What I want to do is output from my reducer multiple files, one for each
> key value. I also want to avoid any deprecated parts of the API.
>
> As suggested, I switched from using MultipleTextOutputFormat to
> MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
> method requires a Reporter as a parameter, but as far as I can tell, the API
> doesn't support this.  The only reporter I can find is in the context
> object, but is declared protected.
>
> Am I stuck? or just missing something?
>
> My code:
>
> @Override
> public void reduce(Text key, Iterable<Text> values, Context context)
>                 throws IOException {
> String fileName = key.toString();
>              MultipleOutputs.addNamedOutput((JobConf)
> context.getConfiguration(), fileName, OutputFormat.class, Text.class,
> Text.class);
>             mos = new MultipleOutputs((JobConf)
> context.getConfiguration());
>             for (Text line : values) {
>
> // This is the problem line:
>                 mos.getCollector(fileName, <reporter goes here>).collect(
>                         key, line);
>             }
>
>             mos.close();
>
>          }
>
> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aaron@cloudera.com <
> http://aaron@cloudera.com> > wrote:
>
> Geoffry,
>
> The new API comes with a related OF, called MultipleOutputs
> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
> using this instead.
>
> - Aaron
>
>
> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com <ht...@gmail.com> > wrote:
>
> All,
>
> What I want to do is output from my reducer multiple files one for each key
> value.
>
> Can this still be done in the current API?
>
> It seems that using MultipleTextOutputFormat requires one to use deprecated
> parts of API.
>
> It this correct?
>
> I would like to use the class or its equivalent and stay off anything
> deprecated.
>
> Is there a work around?
>
> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
> MultipleTextOutputFormat does not derive from this class.
>
> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
> OutputFormat>);
>
>
> In the Old, deprecated API, one uses JobConf and an implementation of the
> interface org.apache.hadoop.mapred.OutputFormat.  MultipleTextOutputFormat
> is just such an implementation.
>
> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
> OutputFormat);
>
>
>
>
>
>
>
>
>

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

Yes. Also attached is an old thread I have kept handy with me. Hope this helps you.


Thanks,
Amogh


On 12/11/09 10:07 PM, "Geoffry Roberts" <ge...@gmail.com> wrote:

Amogh,

I don't have experience with patches for hadoop.

I take it that I apply this patch using the linux patch utility.

I further assume, I need only apply the latest patch, which is 5.

Am I correct.

On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:
http://issues.apache.org/jira/browse/MAPREDUCE-370

You'll  have to work around for now / try to apply patch.

Amogh



On 12/9/09 8:54 PM, "Geoffry Roberts" <geoffry.roberts@gmail.com <ht...@gmail.com> > wrote:

Aaron,

I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
lib.output.MultipleOutputs.  I'm using the download page where the tar ball is dated from Sep.09.

Sounds like I need to look at the code repository.


On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aaron@cloudera.com <ht...@cloudera.com> > wrote:
Geoffry,

There are two MultipleOutputs implementations; one for the new API, one for the old one.

The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does not have a getCollector() method. This is intended to work with org.apache.hadoop.mapreduce.Mapper and its associated Context object.

The old API implementation of MO (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with org.apache.hadoop.mapred.Mapper, Reporter, and friends.

If you're going to use the new org.apache.hadoop.mapreduce-based code, you should not need to import anything in the mapred package. That having been said -- I just realized that the new-API-compatible MultipleOutputs implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If you're using 0.20, you should probably stick with the old API for your process.

Cheers,
- Aaron


On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <geoffry.roberts@gmail.com <ht...@gmail.com> > wrote:
All,

This one has me stumped.

What I want to do is output from my reducer multiple files, one for each key value. I also want to avoid any deprecated parts of the API.

As suggested, I switched from using MultipleTextOutputFormat to MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector method requires a Reporter as a parameter, but as far as I can tell, the API doesn't support this.  The only reporter I can find is in the context object, but is declared protected.

Am I stuck? or just missing something?

My code:

@Override
public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException {
String fileName = key.toString();
             MultipleOutputs.addNamedOutput((JobConf) context.getConfiguration(), fileName, OutputFormat.class, Text.class, Text.class);
            mos = new MultipleOutputs((JobConf) context.getConfiguration());
            for (Text line : values) {

// This is the problem line:
                mos.getCollector(fileName, <reporter goes here>).collect(
                        key, line);
            }

            mos.close();

         }

On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aaron@cloudera.com <ht...@cloudera.com> > wrote:
Geoffry,

The new API comes with a related OF, called MultipleOutputs (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into using this instead.

- Aaron


On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <geoffry.roberts@gmail.com <ht...@gmail.com> > wrote:
All,

What I want to do is output from my reducer multiple files one for each key value.

Can this still be done in the current API?

It seems that using MultipleTextOutputFormat requires one to use deprecated parts of API.

It this correct?

I would like to use the class or its equivalent and stay off anything deprecated.

Is there a work around?

In the current API one uses Job and a class derived from the class org.apache.hadoop.mapreduce.OutputFormat.  MultipleTextOutputFormat does not derive from this class.

Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.OutputFormat>);


In the Old, deprecated API, one uses JobConf and an implementation of the interface org.apache.hadoop.mapred.OutputFormat.  MultipleTextOutputFormat is just such an implementation.

JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .OutputFormat);

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

Posted by Geoffry Roberts <ge...@gmail.com>.

Amogh,

I don't have experience with patches for hadoop.

I take it that I apply this patch using the linux patch utility.

I further assume, I need only apply the latest patch, which is 5.

Am I correct.

On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote:

>  http://issues.apache.org/jira/browse/MAPREDUCE-370
>
> You’ll  have to work around for now / try to apply patch.
>
> Amogh
>
>
>
> On 12/9/09 8:54 PM, "Geoffry Roberts" <ge...@gmail.com> wrote:
>
> Aaron,
>
> I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
>
> lib.output.MultipleOutputs.  I'm using the download page where the tar ball
> is dated from Sep.09.
>
>
>
> Sounds like I need to look at the code repository.
>
>
>
> On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> Geoffry,
>
> There are two MultipleOutputs implementations; one for the new API, one for
> the old one.
>
> The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does
> not have a getCollector() method. This is intended to work with
> org.apache.hadoop.mapreduce.Mapper and its associated Context object.
>
> The old API implementation of MO
> (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with
> org.apache.hadoop.mapred.Mapper, Reporter, and friends.
>
> If you're going to use the new org.apache.hadoop.mapreduce-based code, you
> should not need to import anything in the mapred package. That having been
> said -- I just realized that the new-API-compatible MultipleOutputs
> implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If
> you're using 0.20, you should probably stick with the old API for your
> process.
>
> Cheers,
> - Aaron
>
>
> On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
> All,
>
> This one has me stumped.
>
> What I want to do is output from my reducer multiple files, one for each
> key value. I also want to avoid any deprecated parts of the API.
>
> As suggested, I switched from using MultipleTextOutputFormat to
> MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
> method requires a Reporter as a parameter, but as far as I can tell, the API
> doesn't support this.  The only reporter I can find is in the context
> object, but is declared protected.
>
> Am I stuck? or just missing something?
>
> My code:
>
> @Override
> public void reduce(Text key, Iterable<Text> values, Context context)
>                 throws IOException {
> String fileName = key.toString();
>              MultipleOutputs.addNamedOutput((JobConf)
> context.getConfiguration(), fileName, OutputFormat.class, Text.class,
> Text.class);
>             mos = new MultipleOutputs((JobConf)
> context.getConfiguration());
>             for (Text line : values) {
>
> // This is the problem line:
>                 mos.getCollector(fileName, <reporter goes here>).collect(
>                         key, line);
>             }
>
>             mos.close();
>
>          }
>
> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com> wrote:
>
> Geoffry,
>
> The new API comes with a related OF, called MultipleOutputs
> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
> using this instead.
>
> - Aaron
>
>
> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
> All,
>
> What I want to do is output from my reducer multiple files one for each key
> value.
>
> Can this still be done in the current API?
>
> It seems that using MultipleTextOutputFormat requires one to use deprecated
> parts of API.
>
> It this correct?
>
> I would like to use the class or its equivalent and stay off anything
> deprecated.
>
> Is there a work around?
>
> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
> MultipleTextOutputFormat does not derive from this class.
>
> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
> OutputFormat>);
>
>
> In the Old, deprecated API, one uses JobConf and an implementation of the
> interface org.apache.hadoop.mapred.OutputFormat.  MultipleTextOutputFormat
> is just such an implementation.
>
> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
> OutputFormat);
>
>
>
>
>
>
>

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

Posted by Amogh Vasekar <am...@yahoo-inc.com>.

http://issues.apache.org/jira/browse/MAPREDUCE-370

You'll  have to work around for now / try to apply patch.

Amogh


On 12/9/09 8:54 PM, "Geoffry Roberts" <ge...@gmail.com> wrote:

Aaron,

I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
lib.output.MultipleOutputs.  I'm using the download page where the tar ball is dated from Sep.09.

Sounds like I need to look at the code repository.


On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aa...@cloudera.com> wrote:
Geoffry,

There are two MultipleOutputs implementations; one for the new API, one for the old one.

The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does not have a getCollector() method. This is intended to work with org.apache.hadoop.mapreduce.Mapper and its associated Context object.

The old API implementation of MO (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with org.apache.hadoop.mapred.Mapper, Reporter, and friends.

If you're going to use the new org.apache.hadoop.mapreduce-based code, you should not need to import anything in the mapred package. That having been said -- I just realized that the new-API-compatible MultipleOutputs implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If you're using 0.20, you should probably stick with the old API for your process.

Cheers,
- Aaron


On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <ge...@gmail.com> wrote:
All,

This one has me stumped.

What I want to do is output from my reducer multiple files, one for each key value. I also want to avoid any deprecated parts of the API.

As suggested, I switched from using MultipleTextOutputFormat to MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector method requires a Reporter as a parameter, but as far as I can tell, the API doesn't support this.  The only reporter I can find is in the context object, but is declared protected.

Am I stuck? or just missing something?

My code:

@Override
public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException {
String fileName = key.toString();
             MultipleOutputs.addNamedOutput((JobConf) context.getConfiguration(), fileName, OutputFormat.class, Text.class, Text.class);
            mos = new MultipleOutputs((JobConf) context.getConfiguration());
            for (Text line : values) {

// This is the problem line:
                mos.getCollector(fileName, <reporter goes here>).collect(
                        key, line);
            }

            mos.close();

         }

On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com> wrote:
Geoffry,

The new API comes with a related OF, called MultipleOutputs (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into using this instead.

- Aaron


On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <ge...@gmail.com> wrote:
All,

What I want to do is output from my reducer multiple files one for each key value.

Can this still be done in the current API?

It seems that using MultipleTextOutputFormat requires one to use deprecated parts of API.

It this correct?

I would like to use the class or its equivalent and stay off anything deprecated.

Is there a work around?

In the current API one uses Job and a class derived from the class org.apache.hadoop.mapreduce.OutputFormat.  MultipleTextOutputFormat does not derive from this class.

Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.OutputFormat>);


In the Old, deprecated API, one uses JobConf and an implementation of the interface org.apache.hadoop.mapred.OutputFormat.  MultipleTextOutputFormat is just such an implementation.

JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .OutputFormat);

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

Posted by Geoffry Roberts <ge...@gmail.com>.

Aaron,

I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
>
> lib.output.MultipleOutputs.  I'm using the download page where the tar ball
> is dated from Sep.09.



> Sounds like I need to look at the code repository.
>


On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aa...@cloudera.com> wrote:

> Geoffry,
>
> There are two MultipleOutputs implementations; one for the new API, one for
> the old one.
>
> The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does
> not have a getCollector() method. This is intended to work with
> org.apache.hadoop.mapreduce.Mapper and its associated Context object.
>
> The old API implementation of MO
> (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with
> org.apache.hadoop.mapred.Mapper, Reporter, and friends.
>
> If you're going to use the new org.apache.hadoop.mapreduce-based code, you
> should not need to import anything in the mapred package. That having been
> said -- I just realized that the new-API-compatible MultipleOutputs
> implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If
> you're using 0.20, you should probably stick with the old API for your
> process.
>
> Cheers,
> - Aaron
>
>
> On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com> wrote:
>
>> All,
>>
>> This one has me stumped.
>>
>> What I want to do is output from my reducer multiple files, one for each
>> key value. I also want to avoid any deprecated parts of the API.
>>
>> As suggested, I switched from using MultipleTextOutputFormat to
>> MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
>> method requires a Reporter as a parameter, but as far as I can tell, the API
>> doesn't support this.  The only reporter I can find is in the context
>> object, but is declared protected.
>>
>> Am I stuck? or just missing something?
>>
>> My code:
>>
>> @Override
>> public void reduce(Text key, Iterable<Text> values, Context context)
>>                 throws IOException {
>> String fileName = key.toString();
>>              MultipleOutputs.addNamedOutput((JobConf)
>> context.getConfiguration(), fileName, OutputFormat.class, Text.class,
>> Text.class);
>>             mos = new MultipleOutputs((JobConf)
>> context.getConfiguration());
>>             for (Text line : values) {
>>
>> // This is the problem line:
>>                 mos.getCollector(fileName, <reporter goes here>).collect(
>>                         key, line);
>>             }
>>
>>             mos.close();
>>
>>          }
>>
>> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com>wrote:
>>
>>> Geoffry,
>>>
>>> The new API comes with a related OF, called MultipleOutputs
>>> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
>>> using this instead.
>>>
>>> - Aaron
>>>
>>>
>>> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
>>> geoffry.roberts@gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> What I want to do is output from my reducer multiple files one for each
>>>> key value.
>>>>
>>>> Can this still be done in the current API?
>>>>
>>>> It seems that using MultipleTextOutputFormat requires one to use
>>>> deprecated parts of API.
>>>>
>>>> It this correct?
>>>>
>>>> I would like to use the class or its equivalent and stay off anything
>>>> deprecated.
>>>>
>>>> Is there a work around?
>>>>
>>>> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
>>>> MultipleTextOutputFormat does not derive from this class.
>>>>
>>>> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
>>>> OutputFormat>);
>>>>
>>>>
>>>> In the Old, deprecated API, one uses JobConf and an implementation of
>>>> the interface org.apache.hadoop.mapred.OutputFormat.
>>>> MultipleTextOutputFormat is just such an implementation.
>>>>
>>>> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
>>>> OutputFormat);
>>>>
>>>
>>>
>>
>

Re: Does Using MultipleTextOutputFormat Require the Deprecated API?

Posted by Aaron Kimball <aa...@cloudera.com>.

Geoffry,

There are two MultipleOutputs implementations; one for the new API, one for
the old one.

The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does
not have a getCollector() method. This is intended to work with
org.apache.hadoop.mapreduce.Mapper and its associated Context object.

The old API implementation of MO
(org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with
org.apache.hadoop.mapred.Mapper, Reporter, and friends.

If you're going to use the new org.apache.hadoop.mapreduce-based code, you
should not need to import anything in the mapred package. That having been
said -- I just realized that the new-API-compatible MultipleOutputs
implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If
you're using 0.20, you should probably stick with the old API for your
process.

Cheers,
- Aaron

On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts
<ge...@gmail.com>wrote:

> All,
>
> This one has me stumped.
>
> What I want to do is output from my reducer multiple files, one for each
> key value. I also want to avoid any deprecated parts of the API.
>
> As suggested, I switched from using MultipleTextOutputFormat to
> MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
> method requires a Reporter as a parameter, but as far as I can tell, the API
> doesn't support this.  The only reporter I can find is in the context
> object, but is declared protected.
>
> Am I stuck? or just missing something?
>
> My code:
>
> @Override
> public void reduce(Text key, Iterable<Text> values, Context context)
>                 throws IOException {
> String fileName = key.toString();
>              MultipleOutputs.addNamedOutput((JobConf)
> context.getConfiguration(), fileName, OutputFormat.class, Text.class,
> Text.class);
>             mos = new MultipleOutputs((JobConf)
> context.getConfiguration());
>             for (Text line : values) {
>
> // This is the problem line:
>                 mos.getCollector(fileName, <reporter goes here>).collect(
>                         key, line);
>             }
>
>             mos.close();
>
>          }
>
> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aa...@cloudera.com> wrote:
>
>> Geoffry,
>>
>> The new API comes with a related OF, called MultipleOutputs
>> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
>> using this instead.
>>
>> - Aaron
>>
>>
>> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
>> geoffry.roberts@gmail.com> wrote:
>>
>>> All,
>>>
>>> What I want to do is output from my reducer multiple files one for each
>>> key value.
>>>
>>> Can this still be done in the current API?
>>>
>>> It seems that using MultipleTextOutputFormat requires one to use
>>> deprecated parts of API.
>>>
>>> It this correct?
>>>
>>> I would like to use the class or its equivalent and stay off anything
>>> deprecated.
>>>
>>> Is there a work around?
>>>
>>> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
>>> MultipleTextOutputFormat does not derive from this class.
>>>
>>> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
>>> OutputFormat>);
>>>
>>>
>>> In the Old, deprecated API, one uses JobConf and an implementation of the
>>> interface org.apache.hadoop.mapred.OutputFormat.
>>> MultipleTextOutputFormat is just such an implementation.
>>>
>>> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
>>> OutputFormat);
>>>
>>
>>
>