You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Paul Smith <ps...@aconex.com> on 2009/11/02 02:12:02 UTC

Confused by new API & MultipleOutputFormats using Hadoop 0.20.1

Totally stuck here, I can't seem to find a way to resolve this, but I  
can't use the new API _and_ use the MultipleOutputFormats class.

I found this thread which is related, but doesn't seem to help me (or  
I missed something completely, certainly possible):

http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results

My controller Job class is simple, but I get a compile error trying to  
add the new MultipleOutputs:

public class ControllerMetricGrinder {

     public static class MetricNameMultipleTextOutputFormat extends
             MultipleTextOutputFormat<String, ControllerMetric> {

         @Override
         protected String generateFileNameForKeyValue(String key,  
ControllerMetric value, String name) {
             return key;
         }

     }
     public static void main(String[] args) throws Exception {

         Job job = new Job();
         job.setJarByClass(ControllerMetricGrinder.class);

         job.setOutputKeyClass(Text.class);
         job.setOutputValueClass(ControllerMetric.class);

         job.setMapperClass(ControllerMetricMapper.class);

         job.setCombinerClass(ControllerMetricReducer.class);
         job.setReducerClass(ControllerMetricReducer.class);

  	// COMPILE ERROR HERE
         MultipleOutputs.addMultiNamedOutput(job, "metrics",
                 MetricNameMultipleTextOutputFormat.class,
                 Text.class, ControllerMetric.class);

         job.setNumReduceTasks(5);

         FileInputFormat.addInputPath(job, new Path(args[0]));
         FileOutputFormat.setOutputPath(job, new Path(args[1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);
     }
}

(mappers and reducers are using the new API, and are in separate  
classes).

MultipleOutputs doesn't take a Job, it only takes a JobConf.  Any  
ideas?  I'd prefer to use the new API (because I've written it that  
way), but I'm guessing now I'll have to go and rework everything to  
the OLD API to get this to work.

I'm trying to create a File-per-metric name (there's only 5).

thoughts?

Paul

Re: Confused by new API & MultipleOutputFormats using Hadoop 0.20.1

Posted by Paul Smith <ps...@aconex.com>.
ok great, thanks Tom for replying, I'm still relatively new to Hadoop  
so wasn't sure if I had missed something.


On 09/11/2009, at 2:41 PM, Tom White wrote:

> Multiple outputs has been ported to the new API in 0.21. See
> https://issues.apache.org/jira/browse/MAPREDUCE-370.
>
> Cheers,
> Tom
>
> On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策)  
> <ad...@gmail.com> wrote:
>> I just fall back to old mapred.* APIs, seems MultipleOutputs only  
>> works for
>> the old API.
>>
>> wishes,
>> Xiance
>>
>> On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith <ps...@aconex.com> wrote:
>>
>>> Totally stuck here, I can't seem to find a way to resolve this,  
>>> but I can't
>>> use the new API _and_ use the MultipleOutputFormats class.
>>>
>>> I found this thread which is related, but doesn't seem to help me  
>>> (or I
>>> missed something completely, certainly possible):
>>>
>>>
>>> http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results
>>>
>>> My controller Job class is simple, but I get a compile error  
>>> trying to add
>>> the new MultipleOutputs:
>>>
>>> public class ControllerMetricGrinder {
>>>
>>>    public static class MetricNameMultipleTextOutputFormat extends
>>>            MultipleTextOutputFormat<String, ControllerMetric> {
>>>
>>>        @Override
>>>        protected String generateFileNameForKeyValue(String key,
>>> ControllerMetric value, String name) {
>>>            return key;
>>>        }
>>>
>>>    }
>>>    public static void main(String[] args) throws Exception {
>>>
>>>        Job job = new Job();
>>>        job.setJarByClass(ControllerMetricGrinder.class);
>>>
>>>        job.setOutputKeyClass(Text.class);
>>>        job.setOutputValueClass(ControllerMetric.class);
>>>
>>>        job.setMapperClass(ControllerMetricMapper.class);
>>>
>>>        job.setCombinerClass(ControllerMetricReducer.class);
>>>        job.setReducerClass(ControllerMetricReducer.class);
>>>
>>>        // COMPILE ERROR HERE
>>>        MultipleOutputs.addMultiNamedOutput(job, "metrics",
>>>                MetricNameMultipleTextOutputFormat.class,
>>>                Text.class, ControllerMetric.class);
>>>
>>>        job.setNumReduceTasks(5);
>>>
>>>        FileInputFormat.addInputPath(job, new Path(args[0]));
>>>        FileOutputFormat.setOutputPath(job, new Path(args[1]));
>>>
>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>    }
>>> }
>>>
>>> (mappers and reducers are using the new API, and are in separate  
>>> classes).
>>>
>>> MultipleOutputs doesn't take a Job, it only takes a JobConf.  Any  
>>> ideas?
>>>  I'd prefer to use the new API (because I've written it that way),  
>>> but I'm
>>> guessing now I'll have to go and rework everything to the OLD API  
>>> to get
>>> this to work.
>>>
>>> I'm trying to create a File-per-metric name (there's only 5).
>>>
>>> thoughts?
>>>
>>> Paul
>>>
>>


Re: Confused by new API & MultipleOutputFormats using Hadoop 0.20.1

Posted by Tom White <to...@cloudera.com>.
Multiple outputs has been ported to the new API in 0.21. See
https://issues.apache.org/jira/browse/MAPREDUCE-370.

Cheers,
Tom

On Sat, Nov 7, 2009 at 6:45 AM, Xiance SI(司宪策) <ad...@gmail.com> wrote:
> I just fall back to old mapred.* APIs, seems MultipleOutputs only works for
> the old API.
>
> wishes,
> Xiance
>
> On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith <ps...@aconex.com> wrote:
>
>> Totally stuck here, I can't seem to find a way to resolve this, but I can't
>> use the new API _and_ use the MultipleOutputFormats class.
>>
>> I found this thread which is related, but doesn't seem to help me (or I
>> missed something completely, certainly possible):
>>
>>
>> http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results
>>
>> My controller Job class is simple, but I get a compile error trying to add
>> the new MultipleOutputs:
>>
>> public class ControllerMetricGrinder {
>>
>>    public static class MetricNameMultipleTextOutputFormat extends
>>            MultipleTextOutputFormat<String, ControllerMetric> {
>>
>>        @Override
>>        protected String generateFileNameForKeyValue(String key,
>> ControllerMetric value, String name) {
>>            return key;
>>        }
>>
>>    }
>>    public static void main(String[] args) throws Exception {
>>
>>        Job job = new Job();
>>        job.setJarByClass(ControllerMetricGrinder.class);
>>
>>        job.setOutputKeyClass(Text.class);
>>        job.setOutputValueClass(ControllerMetric.class);
>>
>>        job.setMapperClass(ControllerMetricMapper.class);
>>
>>        job.setCombinerClass(ControllerMetricReducer.class);
>>        job.setReducerClass(ControllerMetricReducer.class);
>>
>>        // COMPILE ERROR HERE
>>        MultipleOutputs.addMultiNamedOutput(job, "metrics",
>>                MetricNameMultipleTextOutputFormat.class,
>>                Text.class, ControllerMetric.class);
>>
>>        job.setNumReduceTasks(5);
>>
>>        FileInputFormat.addInputPath(job, new Path(args[0]));
>>        FileOutputFormat.setOutputPath(job, new Path(args[1]));
>>
>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>    }
>> }
>>
>> (mappers and reducers are using the new API, and are in separate classes).
>>
>> MultipleOutputs doesn't take a Job, it only takes a JobConf.  Any ideas?
>>  I'd prefer to use the new API (because I've written it that way), but I'm
>> guessing now I'll have to go and rework everything to the OLD API to get
>> this to work.
>>
>> I'm trying to create a File-per-metric name (there's only 5).
>>
>> thoughts?
>>
>> Paul
>>
>

Re: Confused by new API & MultipleOutputFormats using Hadoop 0.20.1

Posted by "Xiance SI (司宪策)" <ad...@gmail.com>.
I just fall back to old mapred.* APIs, seems MultipleOutputs only works for
the old API.

wishes,
Xiance

On Mon, Nov 2, 2009 at 9:12 AM, Paul Smith <ps...@aconex.com> wrote:

> Totally stuck here, I can't seem to find a way to resolve this, but I can't
> use the new API _and_ use the MultipleOutputFormats class.
>
> I found this thread which is related, but doesn't seem to help me (or I
> missed something completely, certainly possible):
>
>
> http://markmail.org/message/u4wz5nbcn5rawydq#query:hadoop%20MultipleTextOutputFormat%20OutputFormat%20Job%20JobConf+page:1+mid:5wy63oqa2vs6bj7b+state:results
>
> My controller Job class is simple, but I get a compile error trying to add
> the new MultipleOutputs:
>
> public class ControllerMetricGrinder {
>
>    public static class MetricNameMultipleTextOutputFormat extends
>            MultipleTextOutputFormat<String, ControllerMetric> {
>
>        @Override
>        protected String generateFileNameForKeyValue(String key,
> ControllerMetric value, String name) {
>            return key;
>        }
>
>    }
>    public static void main(String[] args) throws Exception {
>
>        Job job = new Job();
>        job.setJarByClass(ControllerMetricGrinder.class);
>
>        job.setOutputKeyClass(Text.class);
>        job.setOutputValueClass(ControllerMetric.class);
>
>        job.setMapperClass(ControllerMetricMapper.class);
>
>        job.setCombinerClass(ControllerMetricReducer.class);
>        job.setReducerClass(ControllerMetricReducer.class);
>
>        // COMPILE ERROR HERE
>        MultipleOutputs.addMultiNamedOutput(job, "metrics",
>                MetricNameMultipleTextOutputFormat.class,
>                Text.class, ControllerMetric.class);
>
>        job.setNumReduceTasks(5);
>
>        FileInputFormat.addInputPath(job, new Path(args[0]));
>        FileOutputFormat.setOutputPath(job, new Path(args[1]));
>
>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>    }
> }
>
> (mappers and reducers are using the new API, and are in separate classes).
>
> MultipleOutputs doesn't take a Job, it only takes a JobConf.  Any ideas?
>  I'd prefer to use the new API (because I've written it that way), but I'm
> guessing now I'll have to go and rework everything to the OLD API to get
> this to work.
>
> I'm trying to create a File-per-metric name (there's only 5).
>
> thoughts?
>
> Paul
>