You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Inder Pall <in...@gmail.com> on 2013/10/31 17:08:58 UTC

How to specify delimiters in MultipleInputPaths

I want to use MultipleInputs and use multiple mappers to process different
files.
Let's say in all mappers i want to use KeyValueTextInputFormat. The
challenge is that separator for this input format seems to be set at a job
level.

So if i have two files where one is COMMA separated and the other is TAB
separated, can it be handled?

An example code of what i am trying to do

        Configuration configuration = new Configuration();
        configuration.set("key.value.separator.in.input.line", ",");

        Job job = new Job(configuration, "multiple-inputs-mapper");

        //TODO: how to set different delimiters for KeyValueTextInputFormat
for different Mappers
        MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input1"),
KeyValueTextInputFormat.class, Mapper1.class);
        MultipleInputs.addInputPath(job, new
Path("src/main/resources/multiinput/input2"),
KeyValueTextInputFormat.class, Mapper2.class);


        job.setReducerClass(ExampleReducer.class);
        job.setNumReduceTasks(2);
        //TODO: How to set delimiter between key and values in the
textinputFormat
        job.setOutputFormatClass(TextOutputFormat.class);

        //set the mapper output types for keys and values as we we have
used TextOutputFormat
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        FileOutputFormat.setOutputPath(job, new
Path("/tmp/multi-input-tweet-join"));


-- 
Thanks,
- Inder
"You are average of the 5 people you spend the most time with"