You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Inder Pall <in...@gmail.com> on 2013/10/31 17:08:58 UTC
How to specify delimiters in MultipleInputPaths
I want to use MultipleInputs and use multiple mappers to process different
files.
Let's say in all mappers i want to use KeyValueTextInputFormat. The
challenge is that separator for this input format seems to be set at a job
level.
So if i have two files where one is COMMA separated and the other is TAB
separated, can it be handled?
An example code of what i am trying to do
Configuration configuration = new Configuration();
configuration.set("key.value.separator.in.input.line", ",");
Job job = new Job(configuration, "multiple-inputs-mapper");
//TODO: how to set different delimiters for KeyValueTextInputFormat
for different Mappers
MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input1"),
KeyValueTextInputFormat.class, Mapper1.class);
MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input2"),
KeyValueTextInputFormat.class, Mapper2.class);
job.setReducerClass(ExampleReducer.class);
job.setNumReduceTasks(2);
//TODO: How to set delimiter between key and values in the
textinputFormat
job.setOutputFormatClass(TextOutputFormat.class);
//set the mapper output types for keys and values as we we have
used TextOutputFormat
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
FileOutputFormat.setOutputPath(job, new
Path("/tmp/multi-input-tweet-join"));
--
Thanks,
- Inder
"You are average of the 5 people you spend the most time with"