You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ajay Srivastava <Aj...@guavus.com> on 2012/05/01 10:02:29 UTC
Different input streams
Hi,
If there are two inputs to a hadoop job one is text and another is binary (Sequence file), is there a way to set InputFormatClass to these two different streams ?
job.setInputFormatClass will set to one type of input. Does that mean a hadoop job can not take input in two different formats?
Thanks.
Ajay Srivastava
Re: Different input streams
Posted by 黄 山 <th...@gmail.com>.
I get the same problem while I am using streaming for sequence file.
My solution is use 'org.apache.hadoop.streaming.AutoInputFormat' as input format
and add '-D stream.map.input=rawbytes'.
huangs,
thuhuangs09@gmail.com
在 2012-5-1,下午4:02, Ajay Srivastava 写道:
> Hi,
>
> If there are two inputs to a hadoop job one is text and another is binary (Sequence file), is there a way to set InputFormatClass to these two different streams ?
> job.setInputFormatClass will set to one type of input. Does that mean a hadoop job can not take input in two different formats?
>
>
>
> Thanks.
> Ajay Srivastava
Re: Different input streams
Posted by Harsh J <ha...@cloudera.com>.
Ajay,
Take a look at MultipleInputs: See Page 214 | Chapter 7: MapReduce
Types and Formats of Hadoop: The Definitive Guide (2nd edition) by
Tom White (O'Reilly) and also
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
This class will solve your need, just use a common mapper with them.
On Tue, May 1, 2012 at 1:32 PM, Ajay Srivastava
<Aj...@guavus.com> wrote:
> Hi,
>
> If there are two inputs to a hadoop job one is text and another is binary (Sequence file), is there a way to set InputFormatClass to these two different streams ?
> job.setInputFormatClass will set to one type of input. Does that mean a hadoop job can not take input in two different formats?
>
>
>
> Thanks.
> Ajay Srivastava
--
Harsh J