You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sean Arietta <sa...@virginia.edu> on 2009/03/17 04:21:48 UTC

Re: 1 file per record

I have a similar issue and would like some clarification if possible. Suppose
each file is meant to be emitted as a one single record to a set of map
tasks. That is, each key-value pair will include data from one file and one
file alone. 

I have written custom InputFormats and RecordReaders before so I am familiar
with the general process. Does it suffice to just return an empty array from
the InputFormat.getSplits() function and then take care of the actual record
emitting from inside the custom RecordReader? 

Thanks for your time!

-Sean

owen.omalley wrote:
> 
> On Oct 2, 2008, at 1:50 AM, chandravadana wrote:
> 
>> If we dont specify numSplits in getsplits(), then what is the default
>> number of splits taken...
> 
> The getSplits() is either library or user code, so it depends which  
> class you are using as your InputFormat. The FileInputFormats  
> (TextInputFormat and SequenceFileInputFormat) basically divide input  
> files by blocks, unless the requested number of mappers is really high.
> 
> -- Owen
> 
> 

-- 
View this message in context: http://www.nabble.com/1-file-per-record-tp19644985p22551968.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.