You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by yz5od2 <wo...@yahoo.com> on 2009/10/20 21:59:23 UTC
Which FileInputFormat to use for fixed length records?
Hi,
I have input files, that contain NO carriage returns/line feeds. Each
record is a fixed length (i.e. 202 bytes).
Which FileInputFormat should I be using? so that each call to my
Mapper receives one K,V pair, where the KEY is null or something (I
don't care) and the VALUE is the 202 byte record?
thanks!
Re: Which FileInputFormat to use for fixed length records?
Posted by yz5od2 <wo...@yahoo.com>.
Hi all,
I've contributed a couple of classes to support fixed length/width
records in input files. The JIRA issue and attachments are located here:
https://issues.apache.org/jira/browse/MAPREDUCE-1176
thanks, and I hope this helps others out.
On Oct 28, 2009, at 1:58 PM, Aaron Kimball wrote:
> I think these would be good to add to mapreduce in the
> {{org.apache.hadoop.mapreduce.lib.input}} package. Please file a
> JIRA and
> apply a patch!
> - Aaron
>
> On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 <woods5242-
> outdoors@yahoo.com>wrote:
>
>> Hi all,
>> I am working on writing a FixedLengthInputFormat class and a
>> corresponding
>> FixedLengthRecordReader.
>>
>> Would the Hadoop commons project have interest in these? Basically
>> these
>> are for reading inputs of textual record data, where each record is
>> a fixed
>> length, (no carriage returns or separators etc)
>>
>> thanks
>>
>>
>>
>> On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:
>>
>> You'll need to write your own, I'm afraid. You should subclass
>>> FileInputFormat and go from there. You may want to look at
>>> TextInputFormat
>>> /
>>> LineRecordReader for an example of how an IF/RR gets put together,
>>> but
>>> there
>>> isn't an existing fixed-len record reader.
>>>
>>> - Aaron
>>>
>>> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-outdoors@yahoo.com
>>>> wrote:
>>>
>>> Hi,
>>>> I have input files, that contain NO carriage returns/line feeds.
>>>> Each
>>>> record is a fixed length (i.e. 202 bytes).
>>>>
>>>> Which FileInputFormat should I be using? so that each call to my
>>>> Mapper
>>>> receives one K,V pair, where the KEY is null or something (I
>>>> don't care)
>>>> and
>>>> the VALUE is the 202 byte record?
>>>>
>>>> thanks!
>>>>
>>>>
>>
Re: Which FileInputFormat to use for fixed length records?
Posted by Aaron Kimball <aa...@cloudera.com>.
I think these would be good to add to mapreduce in the
{{org.apache.hadoop.mapreduce.lib.input}} package. Please file a JIRA and
apply a patch!
- Aaron
On Wed, Oct 28, 2009 at 11:15 AM, yz5od2 <wo...@yahoo.com>wrote:
> Hi all,
> I am working on writing a FixedLengthInputFormat class and a corresponding
> FixedLengthRecordReader.
>
> Would the Hadoop commons project have interest in these? Basically these
> are for reading inputs of textual record data, where each record is a fixed
> length, (no carriage returns or separators etc)
>
> thanks
>
>
>
> On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:
>
> You'll need to write your own, I'm afraid. You should subclass
>> FileInputFormat and go from there. You may want to look at TextInputFormat
>> /
>> LineRecordReader for an example of how an IF/RR gets put together, but
>> there
>> isn't an existing fixed-len record reader.
>>
>> - Aaron
>>
>> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-outdoors@yahoo.com
>> >wrote:
>>
>> Hi,
>>> I have input files, that contain NO carriage returns/line feeds. Each
>>> record is a fixed length (i.e. 202 bytes).
>>>
>>> Which FileInputFormat should I be using? so that each call to my Mapper
>>> receives one K,V pair, where the KEY is null or something (I don't care)
>>> and
>>> the VALUE is the 202 byte record?
>>>
>>> thanks!
>>>
>>>
>
Re: Which FileInputFormat to use for fixed length records?
Posted by yz5od2 <wo...@yahoo.com>.
Hi all,
I am working on writing a FixedLengthInputFormat class and a
corresponding FixedLengthRecordReader.
Would the Hadoop commons project have interest in these? Basically
these are for reading inputs of textual record data, where each record
is a fixed length, (no carriage returns or separators etc)
thanks
On Oct 20, 2009, at 11:00 PM, Aaron Kimball wrote:
> You'll need to write your own, I'm afraid. You should subclass
> FileInputFormat and go from there. You may want to look at
> TextInputFormat /
> LineRecordReader for an example of how an IF/RR gets put together,
> but there
> isn't an existing fixed-len record reader.
>
> - Aaron
>
> On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <woods5242-
> outdoors@yahoo.com>wrote:
>
>> Hi,
>> I have input files, that contain NO carriage returns/line feeds. Each
>> record is a fixed length (i.e. 202 bytes).
>>
>> Which FileInputFormat should I be using? so that each call to my
>> Mapper
>> receives one K,V pair, where the KEY is null or something (I don't
>> care) and
>> the VALUE is the 202 byte record?
>>
>> thanks!
>>
Re: Which FileInputFormat to use for fixed length records?
Posted by Aaron Kimball <aa...@cloudera.com>.
You'll need to write your own, I'm afraid. You should subclass
FileInputFormat and go from there. You may want to look at TextInputFormat /
LineRecordReader for an example of how an IF/RR gets put together, but there
isn't an existing fixed-len record reader.
- Aaron
On Tue, Oct 20, 2009 at 12:59 PM, yz5od2 <wo...@yahoo.com>wrote:
> Hi,
> I have input files, that contain NO carriage returns/line feeds. Each
> record is a fixed length (i.e. 202 bytes).
>
> Which FileInputFormat should I be using? so that each call to my Mapper
> receives one K,V pair, where the KEY is null or something (I don't care) and
> the VALUE is the 202 byte record?
>
> thanks!
>