You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alan Gates <ga...@yahoo-inc.com> on 2009/09/09 22:34:23 UTC

Re: Storing Pig output into HBase tables

I do not know if there is a general hbase load/import tool.  That  
would be a good question for the hbase-user list.

Right now Pig does not have a store function to write data into  
hbase.  It is possible to write such a function.  If you are  
interested I can send you specific details on how to do it.

Alan.

On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:

> Hi all,
>
> I am working no building a analytics kind of engine which takes  
> daily server
> logs, crunches the data using Pig scripts and (for now) outputs data  
> to
> HDFS. Later, this data is to be stored on HBase to enable efficient  
> querying
> from front-end.
>
> Currently, I am searching for efficient ways of moving the Pig  
> output on
> HDFS to the HBase tables. Though this seems to be a very basic task,  
> I could
> not find any easy way of doing that, except for writing some Java  
> code. The
> problem is I'll have many different kind of output formats, and  
> writing java
> code for loading each such file seems wrong. Probably I am missing
> something.
>
> Is there any way of storing Pig output directly in a Hbase table  
> [loading is
> possible by HBaseStorage, but that doesn't talk of storing]. Or is  
> there any
> general data load/import tool for Hbase?
>
> Thanks!
> Nikhil Gupta
> Graduate Student,
> Stanford University


Re: Storing Pig output into HBase tables

Posted by Alan Gates <ga...@yahoo-inc.com>.
In general we're trying to move to a paradigm of using URIs in loads  
and stores.  So store functions should look like:

store X into 'URI' using MyStoreFunc();

I don't know if HBase has a URI scheme, and if it does, what that  
includes.  If it has a URI scheme that includes the table, then it  
would be best to use that.  If it does not, then you could pass the  
table name as a constructor argument to your store function:

store X into "http://myhbaseserver.mycompany.com" using  
HBaseStorage('mytable');

Alan.

On Sep 19, 2009, at 10:13 AM, Vincent BARAT wrote:

> Thanks Alan,
>
> I also definitively needs this functionality, and I plan to write it  
> soon. I was actually on the process of doing what you explained,  
> but  I was blocked on the best way to specify the name of the HBase  
> table where to store the data (and also the associated storage  
> schema) using the "store A into B using C;" paradigm. Do you have  
> any recommendation about that ?
>
> Alan Gates a écrit :
>> In order to store information in HBase, you will need to use an  
>> OutputFormat that is HBase compatible.  There exists a  
>> TableOutputFormat in Hbase that will write data.  The trick is to  
>> get Pig to use that OutputFormat.  It is possible, but Pig does not  
>> yet do a good job of making it easy.
>> You will need to write a StoreFunc that returns TableOuputFormat  
>> from getStoragePreparationClass.  You will then need to have the  
>> putNext call in StoreFunc write to TableOutputFormat's  
>> RecordWriter.  For an example of how to do this, see contrib/zebra/ 
>> src/java/org/apache/hadoop/zebra/pig/TableStorer.java in Pig's  
>> contrib directory.
>> Alan.
>> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
>>> Hi, Alan. I am interest in this store function, could you mind  
>>> sending me some details?
>>>
>>> --------------------------------------------------
>>> From: "Alan Gates" <ga...@yahoo-inc.com>
>>> Sent: Thursday, September 10, 2009 4:34 AM
>>> To: <pi...@hadoop.apache.org>
>>> Subject: Re: Storing Pig output into HBase tables
>>>
>>>> I do not know if there is a general hbase load/import tool.   
>>>> That  would be a good question for the hbase-user list.
>>>>
>>>> Right now Pig does not have a store function to write data into   
>>>> hbase. It is possible to write such a function.  If you are   
>>>> interested I can send you specific details on how to do it.
>>>>
>>>> Alan.
>>>>
>>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am working no building a analytics kind of engine which takes   
>>>>> daily server
>>>>> logs, crunches the data using Pig scripts and (for now) outputs  
>>>>> data  to
>>>>> HDFS. Later, this data is to be stored on HBase to enable  
>>>>> efficient querying
>>>>> from front-end.
>>>>>
>>>>> Currently, I am searching for efficient ways of moving the Pig   
>>>>> output on
>>>>> HDFS to the HBase tables. Though this seems to be a very basic  
>>>>> task,  I could
>>>>> not find any easy way of doing that, except for writing some  
>>>>> Java  code. The
>>>>> problem is I'll have many different kind of output formats, and   
>>>>> writing java
>>>>> code for loading each such file seems wrong. Probably I am missing
>>>>> something.
>>>>>
>>>>> Is there any way of storing Pig output directly in a Hbase table  
>>>>> [loading is
>>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or  
>>>>> is  there any
>>>>> general data load/import tool for Hbase?
>>>>>
>>>>> Thanks!
>>>>> Nikhil Gupta
>>>>> Graduate Student,
>>>>> Stanford University
>>>>


Re: Storing Pig output into HBase tables

Posted by Vincent BARAT <vi...@ubikod.com>.
Thanks Alan,

I also definitively needs this functionality, and I plan to write it 
soon. I was actually on the process of doing what you explained, but 
  I was blocked on the best way to specify the name of the HBase 
table where to store the data (and also the associated storage 
schema) using the "store A into B using C;" paradigm. Do you have 
any recommendation about that ?

Alan Gates a écrit :
> In order to store information in HBase, you will need to use an 
> OutputFormat that is HBase compatible.  There exists a TableOutputFormat 
> in Hbase that will write data.  The trick is to get Pig to use that 
> OutputFormat.  It is possible, but Pig does not yet do a good job of 
> making it easy.
> 
> You will need to write a StoreFunc that returns TableOuputFormat from 
> getStoragePreparationClass.  You will then need to have the putNext call 
> in StoreFunc write to TableOutputFormat's RecordWriter.  For an example 
> of how to do this, see 
> contrib/zebra/src/java/org/apache/hadoop/zebra/pig/TableStorer.java in 
> Pig's contrib directory.
> 
> Alan.
> 
> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
> 
>> Hi, Alan. I am interest in this store function, could you mind sending 
>> me some details?
>>
>> --------------------------------------------------
>> From: "Alan Gates" <ga...@yahoo-inc.com>
>> Sent: Thursday, September 10, 2009 4:34 AM
>> To: <pi...@hadoop.apache.org>
>> Subject: Re: Storing Pig output into HBase tables
>>
>>> I do not know if there is a general hbase load/import tool.  That  
>>> would be a good question for the hbase-user list.
>>>
>>> Right now Pig does not have a store function to write data into  
>>> hbase. It is possible to write such a function.  If you are  
>>> interested I can send you specific details on how to do it.
>>>
>>> Alan.
>>>
>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am working no building a analytics kind of engine which takes  
>>>> daily server
>>>> logs, crunches the data using Pig scripts and (for now) outputs 
>>>> data  to
>>>> HDFS. Later, this data is to be stored on HBase to enable efficient 
>>>> querying
>>>> from front-end.
>>>>
>>>> Currently, I am searching for efficient ways of moving the Pig  
>>>> output on
>>>> HDFS to the HBase tables. Though this seems to be a very basic 
>>>> task,  I could
>>>> not find any easy way of doing that, except for writing some Java  
>>>> code. The
>>>> problem is I'll have many different kind of output formats, and  
>>>> writing java
>>>> code for loading each such file seems wrong. Probably I am missing
>>>> something.
>>>>
>>>> Is there any way of storing Pig output directly in a Hbase table 
>>>> [loading is
>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is  
>>>> there any
>>>> general data load/import tool for Hbase?
>>>>
>>>> Thanks!
>>>> Nikhil Gupta
>>>> Graduate Student,
>>>> Stanford University
>>>
> 
> 
> 

Re: Storing Pig output into HBase tables

Posted by Alan Gates <ga...@yahoo-inc.com>.
In order to store information in HBase, you will need to use an  
OutputFormat that is HBase compatible.  There exists a  
TableOutputFormat in Hbase that will write data.  The trick is to get  
Pig to use that OutputFormat.  It is possible, but Pig does not yet do  
a good job of making it easy.

You will need to write a StoreFunc that returns TableOuputFormat from  
getStoragePreparationClass.  You will then need to have the putNext  
call in StoreFunc write to TableOutputFormat's RecordWriter.  For an  
example of how to do this, see contrib/zebra/src/java/org/apache/ 
hadoop/zebra/pig/TableStorer.java in Pig's contrib directory.

Alan.

On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:

> Hi, Alan. I am interest in this store function, could you mind  
> sending me some details?
>
> --------------------------------------------------
> From: "Alan Gates" <ga...@yahoo-inc.com>
> Sent: Thursday, September 10, 2009 4:34 AM
> To: <pi...@hadoop.apache.org>
> Subject: Re: Storing Pig output into HBase tables
>
>> I do not know if there is a general hbase load/import tool.  That   
>> would be a good question for the hbase-user list.
>>
>> Right now Pig does not have a store function to write data into   
>> hbase. It is possible to write such a function.  If you are   
>> interested I can send you specific details on how to do it.
>>
>> Alan.
>>
>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>
>>> Hi all,
>>>
>>> I am working no building a analytics kind of engine which takes   
>>> daily server
>>> logs, crunches the data using Pig scripts and (for now) outputs  
>>> data  to
>>> HDFS. Later, this data is to be stored on HBase to enable  
>>> efficient querying
>>> from front-end.
>>>
>>> Currently, I am searching for efficient ways of moving the Pig   
>>> output on
>>> HDFS to the HBase tables. Though this seems to be a very basic  
>>> task,  I could
>>> not find any easy way of doing that, except for writing some Java   
>>> code. The
>>> problem is I'll have many different kind of output formats, and   
>>> writing java
>>> code for loading each such file seems wrong. Probably I am missing
>>> something.
>>>
>>> Is there any way of storing Pig output directly in a Hbase table  
>>> [loading is
>>> possible by HBaseStorage, but that doesn't talk of storing]. Or  
>>> is  there any
>>> general data load/import tool for Hbase?
>>>
>>> Thanks!
>>> Nikhil Gupta
>>> Graduate Student,
>>> Stanford University
>>


Re: Storing Pig output into HBase tables

Posted by Nikhil Gupta <gu...@gmail.com>.
Thanks for your reply, Alan. Please send me the details too.
-nikhil
http://stanford.edu/~nikgupta

On Thu, Sep 10, 2009 at 6:50 AM, Liu Xianglong <sa...@hotmail.com>wrote:

> Hi, Alan. I am interest in this store function, could you mind sending me
> some details?
>
> --------------------------------------------------
> From: "Alan Gates" <ga...@yahoo-inc.com>
> Sent: Thursday, September 10, 2009 4:34 AM
> To: <pi...@hadoop.apache.org>
> Subject: Re: Storing Pig output into HBase tables
>
>  I do not know if there is a general hbase load/import tool.  That  would
>> be a good question for the hbase-user list.
>>
>> Right now Pig does not have a store function to write data into  hbase. It
>> is possible to write such a function.  If you are  interested I can send you
>> specific details on how to do it.
>>
>> Alan.
>>
>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>
>>  Hi all,
>>>
>>> I am working no building a analytics kind of engine which takes  daily
>>> server
>>> logs, crunches the data using Pig scripts and (for now) outputs data  to
>>> HDFS. Later, this data is to be stored on HBase to enable efficient
>>> querying
>>> from front-end.
>>>
>>> Currently, I am searching for efficient ways of moving the Pig  output on
>>> HDFS to the HBase tables. Though this seems to be a very basic task,  I
>>> could
>>> not find any easy way of doing that, except for writing some Java  code.
>>> The
>>> problem is I'll have many different kind of output formats, and  writing
>>> java
>>> code for loading each such file seems wrong. Probably I am missing
>>> something.
>>>
>>> Is there any way of storing Pig output directly in a Hbase table [loading
>>> is
>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is  there
>>> any
>>> general data load/import tool for Hbase?
>>>
>>> Thanks!
>>> Nikhil Gupta
>>> Graduate Student,
>>> Stanford University
>>>
>>
>>
>>

Re: Storing Pig output into HBase tables

Posted by Liu Xianglong <sa...@hotmail.com>.
Hi, Alan. I am interest in this store function, could you mind sending me 
some details?

--------------------------------------------------
From: "Alan Gates" <ga...@yahoo-inc.com>
Sent: Thursday, September 10, 2009 4:34 AM
To: <pi...@hadoop.apache.org>
Subject: Re: Storing Pig output into HBase tables

> I do not know if there is a general hbase load/import tool.  That  would 
> be a good question for the hbase-user list.
>
> Right now Pig does not have a store function to write data into  hbase. 
> It is possible to write such a function.  If you are  interested I can 
> send you specific details on how to do it.
>
> Alan.
>
> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>
>> Hi all,
>>
>> I am working no building a analytics kind of engine which takes  daily 
>> server
>> logs, crunches the data using Pig scripts and (for now) outputs data  to
>> HDFS. Later, this data is to be stored on HBase to enable efficient 
>> querying
>> from front-end.
>>
>> Currently, I am searching for efficient ways of moving the Pig  output on
>> HDFS to the HBase tables. Though this seems to be a very basic task,  I 
>> could
>> not find any easy way of doing that, except for writing some Java  code. 
>> The
>> problem is I'll have many different kind of output formats, and  writing 
>> java
>> code for loading each such file seems wrong. Probably I am missing
>> something.
>>
>> Is there any way of storing Pig output directly in a Hbase table 
>> [loading is
>> possible by HBaseStorage, but that doesn't talk of storing]. Or is  there 
>> any
>> general data load/import tool for Hbase?
>>
>> Thanks!
>> Nikhil Gupta
>> Graduate Student,
>> Stanford University
>
>