You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by yz5od2 <wo...@yahoo.com> on 2009/10/20 21:59:23 UTC

Which FileInputFormat to use for fixed length records?

Hi,
I have input files, that contain NO carriage returns/line feeds. Each  
record is a fixed length (i.e. 202 bytes).

Which FileInputFormat should I be using? so that each call to my  
Mapper receives one K,V pair, where the KEY is null or something (I  
don't care) and the VALUE is the 202 byte record?

thanks!

Re: Which FileInputFormat to use for fixed length records?

Posted by yz5od2 <wo...@yahoo.com>.
Hi all,
I've contributed a couple of classes to support fixed length/width  
records in input files. The JIRA issue and attachments are located here:

https://issues.apache.org/jira/browse/MAPREDUCE-1176

thanks, and I hope this helps others out.

On Oct 28, 2009, at 1:58 PM, Aaron Kimball wrote:

> I think these would be good to add to mapreduce in the
> {{org.apache.hadoop.mapreduce.lib.input}} package. Please file a  
> JIRA and
> apply a patch!
> - Aaron
>
> On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 <woods5242- 
> outdoors@yahoo.com>wrote:
>
>> Hi all,
>> I am working on writing a FixedLengthInputFormat class and a  
>> corresponding
>> FixedLengthRecordReader.
>>
>> Would the Hadoop commons project have interest in these? Basically  
>> these
>> are for reading inputs of textual record data, where each record is  
>> a fixed
>> length, (no carriage returns or separators etc)
>>
>> thanks
>>
>>
>>
>> On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:
>>
>> You'll need to write your own, I'm afraid. You should subclass
>>> FileInputFormat and go from there. You may want to look at  
>>> TextInputFormat
>>> /
>>> LineRecordReader for an example of how an IF/RR gets put together,  
>>> but
>>> there
>>> isn't an existing fixed-len record reader.
>>>
>>> - Aaron
>>>
>>> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-outdoors@yahoo.com
>>>> wrote:
>>>
>>> Hi,
>>>> I have input files, that contain NO carriage returns/line feeds.  
>>>> Each
>>>> record is a fixed length (i.e. 202 bytes).
>>>>
>>>> Which FileInputFormat should I be using? so that each call to my  
>>>> Mapper
>>>> receives one K,V pair, where the KEY is null or something (I  
>>>> don't care)
>>>> and
>>>> the VALUE is the 202 byte record?
>>>>
>>>> thanks!
>>>>
>>>>
>>



Re: Which FileInputFormat to use for fixed length records?

Posted by Aaron Kimball <aa...@cloudera.com>.
I think these would be good to add to mapreduce in the
{{org.apache.hadoop.mapreduce.lib.input}} package. Please file a JIRA and
apply a patch!
- Aaron

On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 <wo...@yahoo.com>wrote:

> Hi all,
> I am working on writing a FixedLengthInputFormat class and a corresponding
> FixedLengthRecordReader.
>
> Would the Hadoop commons project have interest in these? Basically these
> are for reading inputs of textual record data, where each record is a fixed
> length, (no carriage returns or separators etc)
>
> thanks
>
>
>
> On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:
>
>  You'll need to write your own, I'm afraid. You should subclass
>> FileInputFormat and go from there. You may want to look at TextInputFormat
>> /
>> LineRecordReader for an example of how an IF/RR gets put together, but
>> there
>> isn't an existing fixed-len record reader.
>>
>> - Aaron
>>
>> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-outdoors@yahoo.com
>> >wrote:
>>
>>  Hi,
>>> I have input files, that contain NO carriage returns/line feeds. Each
>>> record is a fixed length (i.e. 202 bytes).
>>>
>>> Which FileInputFormat should I be using? so that each call to my Mapper
>>> receives one K,V pair, where the KEY is null or something (I don't care)
>>> and
>>> the VALUE is the 202 byte record?
>>>
>>> thanks!
>>>
>>>
>

Re: Which FileInputFormat to use for fixed length records?

Posted by yz5od2 <wo...@yahoo.com>.
Hi all,
I am working on writing a FixedLengthInputFormat class and a  
corresponding FixedLengthRecordReader.

Would the Hadoop commons project have interest in these? Basically  
these are for reading inputs of textual record data, where each record  
is a fixed length, (no carriage returns or separators etc)

thanks


On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:

> You'll need to write your own, I'm afraid. You should subclass
> FileInputFormat and go from there. You may want to look at  
> TextInputFormat /
> LineRecordReader for an example of how an IF/RR gets put together,  
> but there
> isn't an existing fixed-len record reader.
>
> - Aaron
>
> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242- 
> outdoors@yahoo.com>wrote:
>
>> Hi,
>> I have input files, that contain NO carriage returns/line feeds. Each
>> record is a fixed length (i.e. 202 bytes).
>>
>> Which FileInputFormat should I be using? so that each call to my  
>> Mapper
>> receives one K,V pair, where the KEY is null or something (I don't  
>> care) and
>> the VALUE is the 202 byte record?
>>
>> thanks!
>>


Re: Which FileInputFormat to use for fixed length records?

Posted by Aaron Kimball <aa...@cloudera.com>.
You'll need to write your own, I'm afraid. You should subclass
FileInputFormat and go from there. You may want to look at TextInputFormat /
LineRecordReader for an example of how an IF/RR gets put together, but there
isn't an existing fixed-len record reader.

- Aaron

On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <wo...@yahoo.com>wrote:

> Hi,
> I have input files, that contain NO carriage returns/line feeds. Each
> record is a fixed length (i.e. 202 bytes).
>
> Which FileInputFormat should I be using? so that each call to my Mapper
> receives one K,V pair, where the KEY is null or something (I don't care) and
> the VALUE is the 202 byte record?
>
> thanks!
>