You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by maha <ma...@umail.ucsb.edu> on 2011/03/10 01:20:39 UTC

Binary Input Files

Hello,

   	I find my question in the Archives http://www.mail-archive.com/core-user@hadoop.apache.org/msg01750.html

   which is how to use a my binary files with my specific buffer protocol to with the InputFormat. 

  The answer is suggesting some base64 conversion, which I think eliminate the benefits of using Binary files. 

       If I decided to write my own InputFormat that defines Splits based on my binary protocol and a recordReader also on my binary protocol. 

   Will that interfere with the streaming stuff ? or it is doable ?

Thank you,
Maha

RE: Binary Input Files

Posted by Michael Segel <mi...@hotmail.com>.
No,

Sorry. I meant that we are using HBase and we have binary files with binary records that we want to store. So we have the same problem....
So your approach of creating a custom InputFormat is the way to go.

HTH

-Mike

> Subject: Re: Binary Input Files
> From: maha@umail.ucsb.edu
> Date: Wed, 9 Mar 2011 18:21:13 -0800
> To: common-user@hadoop.apache.org
> 
> 
> So you're suggesting that using HBase will be an alternative to creating my own stuff?!! By the way, why don't you use Binary inputs? do you think it's not gonna have great affect on performance?
> 
> Thanks Mike.
> 
> On Mar 9, 2011, at 5:27 PM, Michael Segel <mi...@hotmail.com> wrote:
> 
> > 
> > 
> > Maha,
> > 
> > I haven't tried streaming, but ingestion of Binary data in to HBase means doing exactly what you suggest. (Write your own BinaryInputFormat and define your own record splits.)
> > 
> > HTH
> > 
> > -Mike
> > 
> >> From: maha@umail.ucsb.edu
> >> Subject: Binary Input Files
> >> Date: Wed, 9 Mar 2011 16:20:39 -0800
> >> To: common-user@hadoop.apache.org
> >> 
> >> Hello,
> >> 
> >>       I find my question in the Archives http://www.mail-archive.com/core-user@hadoop.apache.org/msg01750.html
> >> 
> >>   which is how to use a my binary files with my specific buffer protocol to with the InputFormat. 
> >> 
> >>  The answer is suggesting some base64 conversion, which I think eliminate the benefits of using Binary files. 
> >> 
> >>       If I decided to write my own InputFormat that defines Splits based on my binary protocol and a recordReader also on my binary protocol. 
> >> 
> >>   Will that interfere with the streaming stuff ? or it is doable ?
> >> 
> >> Thank you,
> >> Maha
> >                         
 		 	   		  

Re: Binary Input Files

Posted by Maha <ma...@umail.ucsb.edu>.
So you're suggesting that using HBase will be an alternative to creating my own stuff?!! By the way, why don't you use Binary inputs? do you think it's not gonna have great affect on performance?

Thanks Mike.

On Mar 9, 2011, at 5:27 PM, Michael Segel <mi...@hotmail.com> wrote:

> 
> 
> Maha,
> 
> I haven't tried streaming, but ingestion of Binary data in to HBase means doing exactly what you suggest. (Write your own BinaryInputFormat and define your own record splits.)
> 
> HTH
> 
> -Mike
> 
>> From: maha@umail.ucsb.edu
>> Subject: Binary Input Files
>> Date: Wed, 9 Mar 2011 16:20:39 -0800
>> To: common-user@hadoop.apache.org
>> 
>> Hello,
>> 
>>       I find my question in the Archives http://www.mail-archive.com/core-user@hadoop.apache.org/msg01750.html
>> 
>>   which is how to use a my binary files with my specific buffer protocol to with the InputFormat. 
>> 
>>  The answer is suggesting some base64 conversion, which I think eliminate the benefits of using Binary files. 
>> 
>>       If I decided to write my own InputFormat that defines Splits based on my binary protocol and a recordReader also on my binary protocol. 
>> 
>>   Will that interfere with the streaming stuff ? or it is doable ?
>> 
>> Thank you,
>> Maha
>                         

RE: Binary Input Files

Posted by Michael Segel <mi...@hotmail.com>.

Maha,

I haven't tried streaming, but ingestion of Binary data in to HBase means doing exactly what you suggest. (Write your own BinaryInputFormat and define your own record splits.)

HTH

-Mike

> From: maha@umail.ucsb.edu
> Subject: Binary Input Files
> Date: Wed, 9 Mar 2011 16:20:39 -0800
> To: common-user@hadoop.apache.org
> 
> Hello,
> 
>    	I find my question in the Archives http://www.mail-archive.com/core-user@hadoop.apache.org/msg01750.html
> 
>    which is how to use a my binary files with my specific buffer protocol to with the InputFormat. 
> 
>   The answer is suggesting some base64 conversion, which I think eliminate the benefits of using Binary files. 
> 
>        If I decided to write my own InputFormat that defines Splits based on my binary protocol and a recordReader also on my binary protocol. 
> 
>    Will that interfere with the streaming stuff ? or it is doable ?
> 
> Thank you,
> Maha