You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Francesco Silvestri <yu...@gmail.com> on 2012/09/03 16:56:45 UTC

reading a binary file

Hello,

I have a binary file of integers and I would like an input format that
generates pairs <key,value>, where value is an integer in the file and key
the position of the integer in the file. Which class should I use? (i.e.
I'm looking for a kind of TextinputFormat for binary files)

Thank you for your consideration,

Francesco

Re: reading a binary file

Posted by Bejoy Ks <be...@gmail.com>.
Hi Francesco

TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by line.
I believe you may have to write your own custom  Record Reader to get this
done.

On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hi Mohammad,
>
> SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html> requires
> the file to be a sequence of key/value stored in binary (i.e., the key is
> stored in the file). In my case, the key is implicitly given by the
> position of the value within the file.
>
> Thank you,
> Francesco
>
>
>
> On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> Hello Francesco,
>>
>>         Have a look at SequenceFileInputFormat :
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a binary file of integers and I would like an input format that
>>> generates pairs <key,value>, where value is an integer in the file and key
>>> the position of the integer in the file. Which class should I use? (i.e.
>>> I'm looking for a kind of TextinputFormat for binary files)
>>>
>>> Thank you for your consideration,
>>>
>>> Francesco
>>>
>>
>>
>

Re: reading a binary file

Posted by Bejoy Ks <be...@gmail.com>.
Hi Francesco

TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by line.
I believe you may have to write your own custom  Record Reader to get this
done.

On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hi Mohammad,
>
> SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html> requires
> the file to be a sequence of key/value stored in binary (i.e., the key is
> stored in the file). In my case, the key is implicitly given by the
> position of the value within the file.
>
> Thank you,
> Francesco
>
>
>
> On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> Hello Francesco,
>>
>>         Have a look at SequenceFileInputFormat :
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a binary file of integers and I would like an input format that
>>> generates pairs <key,value>, where value is an integer in the file and key
>>> the position of the integer in the file. Which class should I use? (i.e.
>>> I'm looking for a kind of TextinputFormat for binary files)
>>>
>>> Thank you for your consideration,
>>>
>>> Francesco
>>>
>>
>>
>

Re: reading a binary file

Posted by Bejoy Ks <be...@gmail.com>.
Hi Francesco

TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by line.
I believe you may have to write your own custom  Record Reader to get this
done.

On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hi Mohammad,
>
> SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html> requires
> the file to be a sequence of key/value stored in binary (i.e., the key is
> stored in the file). In my case, the key is implicitly given by the
> position of the value within the file.
>
> Thank you,
> Francesco
>
>
>
> On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> Hello Francesco,
>>
>>         Have a look at SequenceFileInputFormat :
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a binary file of integers and I would like an input format that
>>> generates pairs <key,value>, where value is an integer in the file and key
>>> the position of the integer in the file. Which class should I use? (i.e.
>>> I'm looking for a kind of TextinputFormat for binary files)
>>>
>>> Thank you for your consideration,
>>>
>>> Francesco
>>>
>>
>>
>

Re: reading a binary file

Posted by Bejoy Ks <be...@gmail.com>.
Hi Francesco

TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by line.
I believe you may have to write your own custom  Record Reader to get this
done.

On Mon, Sep 3, 2012 at 8:38 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hi Mohammad,
>
> SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html> requires
> the file to be a sequence of key/value stored in binary (i.e., the key is
> stored in the file). In my case, the key is implicitly given by the
> position of the value within the file.
>
> Thank you,
> Francesco
>
>
>
> On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> Hello Francesco,
>>
>>         Have a look at SequenceFileInputFormat :
>> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>>
>> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I have a binary file of integers and I would like an input format that
>>> generates pairs <key,value>, where value is an integer in the file and key
>>> the position of the integer in the file. Which class should I use? (i.e.
>>> I'm looking for a kind of TextinputFormat for binary files)
>>>
>>> Thank you for your consideration,
>>>
>>> Francesco
>>>
>>
>>
>

Re: reading a binary file

Posted by Francesco Silvestri <yu...@gmail.com>.
Hi Mohammad,

SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html>
requires
the file to be a sequence of key/value stored in binary (i.e., the key is
stored in the file). In my case, the key is implicitly given by the
position of the value within the file.

Thank you,
Francesco



On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Francesco,
>
>         Have a look at SequenceFileInputFormat :
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>
>> Hello,
>>
>> I have a binary file of integers and I would like an input format that
>> generates pairs <key,value>, where value is an integer in the file and key
>> the position of the integer in the file. Which class should I use? (i.e.
>> I'm looking for a kind of TextinputFormat for binary files)
>>
>> Thank you for your consideration,
>>
>> Francesco
>>
>
>

Re: reading a binary file

Posted by Francesco Silvestri <yu...@gmail.com>.
Hi Mohammad,

SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html>
requires
the file to be a sequence of key/value stored in binary (i.e., the key is
stored in the file). In my case, the key is implicitly given by the
position of the value within the file.

Thank you,
Francesco



On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Francesco,
>
>         Have a look at SequenceFileInputFormat :
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>
>> Hello,
>>
>> I have a binary file of integers and I would like an input format that
>> generates pairs <key,value>, where value is an integer in the file and key
>> the position of the integer in the file. Which class should I use? (i.e.
>> I'm looking for a kind of TextinputFormat for binary files)
>>
>> Thank you for your consideration,
>>
>> Francesco
>>
>
>

Re: reading a binary file

Posted by Francesco Silvestri <yu...@gmail.com>.
Hi Mohammad,

SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html>
requires
the file to be a sequence of key/value stored in binary (i.e., the key is
stored in the file). In my case, the key is implicitly given by the
position of the value within the file.

Thank you,
Francesco



On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Francesco,
>
>         Have a look at SequenceFileInputFormat :
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>
>> Hello,
>>
>> I have a binary file of integers and I would like an input format that
>> generates pairs <key,value>, where value is an integer in the file and key
>> the position of the integer in the file. Which class should I use? (i.e.
>> I'm looking for a kind of TextinputFormat for binary files)
>>
>> Thank you for your consideration,
>>
>> Francesco
>>
>
>

Re: reading a binary file

Posted by Francesco Silvestri <yu...@gmail.com>.
Hi Mohammad,

SequenceFileInputFormat<http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html>
requires
the file to be a sequence of key/value stored in binary (i.e., the key is
stored in the file). In my case, the key is implicitly given by the
position of the value within the file.

Thank you,
Francesco



On Mon, Sep 3, 2012 at 5:01 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello Francesco,
>
>         Have a look at SequenceFileInputFormat :
> http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:
>
>> Hello,
>>
>> I have a binary file of integers and I would like an input format that
>> generates pairs <key,value>, where value is an integer in the file and key
>> the position of the integer in the file. Which class should I use? (i.e.
>> I'm looking for a kind of TextinputFormat for binary files)
>>
>> Thank you for your consideration,
>>
>> Francesco
>>
>
>

Re: reading a binary file

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Francesco,

        Have a look at SequenceFileInputFormat :
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html

Regards,
    Mohammad Tariq



On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hello,
>
> I have a binary file of integers and I would like an input format that
> generates pairs <key,value>, where value is an integer in the file and key
> the position of the integer in the file. Which class should I use? (i.e.
> I'm looking for a kind of TextinputFormat for binary files)
>
> Thank you for your consideration,
>
> Francesco
>

Re: reading a binary file

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Francesco,

        Have a look at SequenceFileInputFormat :
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html

Regards,
    Mohammad Tariq



On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hello,
>
> I have a binary file of integers and I would like an input format that
> generates pairs <key,value>, where value is an integer in the file and key
> the position of the integer in the file. Which class should I use? (i.e.
> I'm looking for a kind of TextinputFormat for binary files)
>
> Thank you for your consideration,
>
> Francesco
>

Re: reading a binary file

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Francesco,

        Have a look at SequenceFileInputFormat :
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html

Regards,
    Mohammad Tariq



On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hello,
>
> I have a binary file of integers and I would like an input format that
> generates pairs <key,value>, where value is an integer in the file and key
> the position of the integer in the file. Which class should I use? (i.e.
> I'm looking for a kind of TextinputFormat for binary files)
>
> Thank you for your consideration,
>
> Francesco
>

Re: reading a binary file

Posted by Mohammad Tariq <do...@gmail.com>.
Hello Francesco,

        Have a look at SequenceFileInputFormat :
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/SequenceFileInputFormat.html

Regards,
    Mohammad Tariq



On Mon, Sep 3, 2012 at 8:26 PM, Francesco Silvestri <yu...@gmail.com>wrote:

> Hello,
>
> I have a binary file of integers and I would like an input format that
> generates pairs <key,value>, where value is an integer in the file and key
> the position of the integer in the file. Which class should I use? (i.e.
> I'm looking for a kind of TextinputFormat for binary files)
>
> Thank you for your consideration,
>
> Francesco
>