You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mohammad Tariq <do...@gmail.com> on 2012/07/10 13:15:06 UTC
Emitting Java Collection as mapper output
Hello list,
Is it possible to emit Java collections from a mapper??
My code looks like this -
public class UKOOAMapper extends Mapper<LongWritable, Text,
LongWritable, List<Text>> {
public static Text CDPX = new Text();
public static Text CDPY = new Text();
public static List<Text> vals = new ArrayList<Text>();
public static LongWritable count = new LongWritable(1);
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
if (line.startsWith("Q")) {
CDPX.set(line.substring(2, 13).trim());
CDPY.set(line.substring(20, 25).trim());
vals.add(CDPX);
vals.add(CDPY);
context.write(count, vals);
}
}
}
And the driver class is -
public static void main(String[] args) throws IOException,
InterruptedException, ClassNotFoundException {
Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
Configuration conf = new Configuration();
Job job = new Job(conf, "SupportFileValidation");
conf.set("mapreduce.output.key.field.separator", " ");
job.setMapOutputValueClass(List.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(UKOOAMapper.class);
job.setReducerClass(ValidationReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, filePath);
FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
job.waitForCompletion(true);
}
When I am trying to execute the program, I am getting the following error -
12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
12/07/10 16:41:46 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0%
12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0
Need some guidance from the experts. Please let me know where I am
going wrong. Many thanks.
Regards,
Mohammad Tariq
Re: Emitting Java Collection as mapper output
Posted by Mohammad Tariq <do...@gmail.com>.
Hello Harsh,
Thank you so much for the valuable response.I'll proceed as
suggested by you.
Regards,
Mohammad Tariq
On Tue, Jul 10, 2012 at 5:05 PM, Harsh J <ha...@cloudera.com> wrote:
> Short answer: Yes.
>
> With Writable serialization, there's *some* support for collection
> structures in the form of MapWritable and ArrayWritable. You can make
> use of these classes.
>
> However, I suggest using Apache Avro for these things, its much better
> to use its schema/reflect oriented serialization than using Writables.
> See http://avro.apache.org
>
> On Tue, Jul 10, 2012 at 4:45 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> Hello list,
>>
>> Is it possible to emit Java collections from a mapper??
>>
>> My code looks like this -
>> public class UKOOAMapper extends Mapper<LongWritable, Text,
>> LongWritable, List<Text>> {
>>
>> public static Text CDPX = new Text();
>> public static Text CDPY = new Text();
>> public static List<Text> vals = new ArrayList<Text>();
>> public static LongWritable count = new LongWritable(1);
>>
>> public void map(LongWritable key, Text value, Context context)
>> throws IOException, InterruptedException {
>> String line = value.toString();
>> if (line.startsWith("Q")) {
>> CDPX.set(line.substring(2, 13).trim());
>> CDPY.set(line.substring(20, 25).trim());
>> vals.add(CDPX);
>> vals.add(CDPY);
>> context.write(count, vals);
>> }
>> }
>> }
>>
>> And the driver class is -
>> public static void main(String[] args) throws IOException,
>> InterruptedException, ClassNotFoundException {
>>
>> Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
>> Configuration conf = new Configuration();
>> Job job = new Job(conf, "SupportFileValidation");
>> conf.set("mapreduce.output.key.field.separator", " ");
>> job.setMapOutputValueClass(List.class);
>> job.setOutputKeyClass(LongWritable.class);
>> job.setOutputValueClass(Text.class);
>> job.setMapperClass(UKOOAMapper.class);
>> job.setReducerClass(ValidationReducer.class);
>> job.setInputFormatClass(TextInputFormat.class);
>> job.setOutputFormatClass(TextOutputFormat.class);
>> FileInputFormat.addInputPath(job, filePath);
>> FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
>> job.waitForCompletion(true);
>> }
>>
>> When I am trying to execute the program, I am getting the following error -
>> 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes
>> where applicable
>> 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the
>> same.
>> 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
>> 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
>> 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
>> 12/07/10 16:41:46 INFO mapred.Task: Using ResourceCalculatorPlugin :
>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
>> 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
>> 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>> 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
>> 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
>> java.lang.NullPointerException
>> at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
>> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
>> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> 12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0%
>> 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
>> 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0
>>
>> Need some guidance from the experts. Please let me know where I am
>> going wrong. Many thanks.
>>
>> Regards,
>> Mohammad Tariq
>
>
>
> --
> Harsh J
Re: Emitting Java Collection as mapper output
Posted by Harsh J <ha...@cloudera.com>.
Short answer: Yes.
With Writable serialization, there's *some* support for collection
structures in the form of MapWritable and ArrayWritable. You can make
use of these classes.
However, I suggest using Apache Avro for these things, its much better
to use its schema/reflect oriented serialization than using Writables.
See http://avro.apache.org
On Tue, Jul 10, 2012 at 4:45 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hello list,
>
> Is it possible to emit Java collections from a mapper??
>
> My code looks like this -
> public class UKOOAMapper extends Mapper<LongWritable, Text,
> LongWritable, List<Text>> {
>
> public static Text CDPX = new Text();
> public static Text CDPY = new Text();
> public static List<Text> vals = new ArrayList<Text>();
> public static LongWritable count = new LongWritable(1);
>
> public void map(LongWritable key, Text value, Context context)
> throws IOException, InterruptedException {
> String line = value.toString();
> if (line.startsWith("Q")) {
> CDPX.set(line.substring(2, 13).trim());
> CDPY.set(line.substring(20, 25).trim());
> vals.add(CDPX);
> vals.add(CDPY);
> context.write(count, vals);
> }
> }
> }
>
> And the driver class is -
> public static void main(String[] args) throws IOException,
> InterruptedException, ClassNotFoundException {
>
> Path filePath = new Path("/ukooa/UKOOAP190.0026_FAZENDA_JUERANA_1.ukooa");
> Configuration conf = new Configuration();
> Job job = new Job(conf, "SupportFileValidation");
> conf.set("mapreduce.output.key.field.separator", " ");
> job.setMapOutputValueClass(List.class);
> job.setOutputKeyClass(LongWritable.class);
> job.setOutputValueClass(Text.class);
> job.setMapperClass(UKOOAMapper.class);
> job.setReducerClass(ValidationReducer.class);
> job.setInputFormatClass(TextInputFormat.class);
> job.setOutputFormatClass(TextOutputFormat.class);
> FileInputFormat.addInputPath(job, filePath);
> FileOutputFormat.setOutputPath(job, new Path("/mapout/"+filePath));
> job.waitForCompletion(true);
> }
>
> When I am trying to execute the program, I am getting the following error -
> 12/07/10 16:41:46 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes
> where applicable
> 12/07/10 16:41:46 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 12/07/10 16:41:46 INFO input.FileInputFormat: Total input paths to process : 1
> 12/07/10 16:41:46 INFO mapred.JobClient: Running job: job_local_0001
> 12/07/10 16:41:46 INFO util.ProcessTree: setsid exited with exit code 0
> 12/07/10 16:41:46 INFO mapred.Task: Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@456dfa45
> 12/07/10 16:41:46 INFO mapred.MapTask: io.sort.mb = 100
> 12/07/10 16:41:46 INFO mapred.MapTask: data buffer = 79691776/99614720
> 12/07/10 16:41:46 INFO mapred.MapTask: record buffer = 262144/327680
> 12/07/10 16:41:46 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.NullPointerException
> at org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:965)
> at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0%
> 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
> 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0
>
> Need some guidance from the experts. Please let me know where I am
> going wrong. Many thanks.
>
> Regards,
> Mohammad Tariq
--
Harsh J