You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Qiong Zhang <ja...@yahoo-inc.com> on 2008/07/01 18:24:09 UTC

one input file per map

Hi,

 

Is there an existing input format/split which supports one input file
(e.g. plain text) per map task?

 

Thanks,

James


Re: one input file per map

Posted by Yang Chen <ch...@gmail.com>.
Maybe consider a hierachy. The first level is one map per file, and the
second level is map/reduce for parent level.

YC


On 7/3/08, Jason Venner <ja...@attributor.com> wrote:
>
> You could also set your input split size to Long.MAX_VALUE.
>
> Goel, Ankur wrote:
>
>> Nope, But if the intent is so then there are 2 ways of doing it.
>>
>> 1. Just extend the input format of your choice and override
>> isSplitable() method to return false.
>>
>> 2. Compress your text file using a compression format supported by
>> hadoop (e.g gzip). This will ensure that one map task processes 1 file
>> since compressed files are not split between processes.
>>
>>
>> -----Original Message-----
>> From: Qiong Zhang [mailto:jamesz@yahoo-inc.com] Sent: Tuesday, July 01,
>> 2008 9:54 PM
>> To: core-user@hadoop.apache.org
>> Subject: one input file per map
>> Hi,
>>
>>
>> Is there an existing input format/split which supports one input file
>> (e.g. plain text) per map task?
>>
>>
>> Thanks,
>>
>> James
>>
>>
>>
>

Re: one input file per map

Posted by Jason Venner <ja...@attributor.com>.
You could also set your input split size to Long.MAX_VALUE.

Goel, Ankur wrote:
> Nope, But if the intent is so then there are 2 ways of doing it.
>
> 1. Just extend the input format of your choice and override
> isSplitable() method to return false.
>
> 2. Compress your text file using a compression format supported by
> hadoop (e.g gzip). This will ensure that one map task processes 1 file
> since compressed files are not split between processes.
>
>
> -----Original Message-----
> From: Qiong Zhang [mailto:jamesz@yahoo-inc.com] 
> Sent: Tuesday, July 01, 2008 9:54 PM
> To: core-user@hadoop.apache.org
> Subject: one input file per map 
>
> Hi,
>
>  
>
> Is there an existing input format/split which supports one input file
> (e.g. plain text) per map task?
>
>  
>
> Thanks,
>
> James
>
>   

RE: one input file per map

Posted by "Goel, Ankur" <An...@corp.aol.com>.
Nope, But if the intent is so then there are 2 ways of doing it.

1. Just extend the input format of your choice and override
isSplitable() method to return false.

2. Compress your text file using a compression format supported by
hadoop (e.g gzip). This will ensure that one map task processes 1 file
since compressed files are not split between processes.


-----Original Message-----
From: Qiong Zhang [mailto:jamesz@yahoo-inc.com] 
Sent: Tuesday, July 01, 2008 9:54 PM
To: core-user@hadoop.apache.org
Subject: one input file per map 

Hi,

 

Is there an existing input format/split which supports one input file
(e.g. plain text) per map task?

 

Thanks,

James