You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Alan Gates <ga...@yahoo-inc.com> on 2009/09/09 22:34:23 UTC
Re: Storing Pig output into HBase tables
I do not know if there is a general hbase load/import tool. That
would be a good question for the hbase-user list.
Right now Pig does not have a store function to write data into
hbase. It is possible to write such a function. If you are
interested I can send you specific details on how to do it.
Alan.
On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
> Hi all,
>
> I am working no building a analytics kind of engine which takes
> daily server
> logs, crunches the data using Pig scripts and (for now) outputs data
> to
> HDFS. Later, this data is to be stored on HBase to enable efficient
> querying
> from front-end.
>
> Currently, I am searching for efficient ways of moving the Pig
> output on
> HDFS to the HBase tables. Though this seems to be a very basic task,
> I could
> not find any easy way of doing that, except for writing some Java
> code. The
> problem is I'll have many different kind of output formats, and
> writing java
> code for loading each such file seems wrong. Probably I am missing
> something.
>
> Is there any way of storing Pig output directly in a Hbase table
> [loading is
> possible by HBaseStorage, but that doesn't talk of storing]. Or is
> there any
> general data load/import tool for Hbase?
>
> Thanks!
> Nikhil Gupta
> Graduate Student,
> Stanford University
Re: Storing Pig output into HBase tables
Posted by Alan Gates <ga...@yahoo-inc.com>.
In general we're trying to move to a paradigm of using URIs in loads
and stores. So store functions should look like:
store X into 'URI' using MyStoreFunc();
I don't know if HBase has a URI scheme, and if it does, what that
includes. If it has a URI scheme that includes the table, then it
would be best to use that. If it does not, then you could pass the
table name as a constructor argument to your store function:
store X into "http://myhbaseserver.mycompany.com" using
HBaseStorage('mytable');
Alan.
On Sep 19, 2009, at 10:13 AM, Vincent BARAT wrote:
> Thanks Alan,
>
> I also definitively needs this functionality, and I plan to write it
> soon. I was actually on the process of doing what you explained,
> but I was blocked on the best way to specify the name of the HBase
> table where to store the data (and also the associated storage
> schema) using the "store A into B using C;" paradigm. Do you have
> any recommendation about that ?
>
> Alan Gates a écrit :
>> In order to store information in HBase, you will need to use an
>> OutputFormat that is HBase compatible. There exists a
>> TableOutputFormat in Hbase that will write data. The trick is to
>> get Pig to use that OutputFormat. It is possible, but Pig does not
>> yet do a good job of making it easy.
>> You will need to write a StoreFunc that returns TableOuputFormat
>> from getStoragePreparationClass. You will then need to have the
>> putNext call in StoreFunc write to TableOutputFormat's
>> RecordWriter. For an example of how to do this, see contrib/zebra/
>> src/java/org/apache/hadoop/zebra/pig/TableStorer.java in Pig's
>> contrib directory.
>> Alan.
>> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
>>> Hi, Alan. I am interest in this store function, could you mind
>>> sending me some details?
>>>
>>> --------------------------------------------------
>>> From: "Alan Gates" <ga...@yahoo-inc.com>
>>> Sent: Thursday, September 10, 2009 4:34 AM
>>> To: <pi...@hadoop.apache.org>
>>> Subject: Re: Storing Pig output into HBase tables
>>>
>>>> I do not know if there is a general hbase load/import tool.
>>>> That would be a good question for the hbase-user list.
>>>>
>>>> Right now Pig does not have a store function to write data into
>>>> hbase. It is possible to write such a function. If you are
>>>> interested I can send you specific details on how to do it.
>>>>
>>>> Alan.
>>>>
>>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am working no building a analytics kind of engine which takes
>>>>> daily server
>>>>> logs, crunches the data using Pig scripts and (for now) outputs
>>>>> data to
>>>>> HDFS. Later, this data is to be stored on HBase to enable
>>>>> efficient querying
>>>>> from front-end.
>>>>>
>>>>> Currently, I am searching for efficient ways of moving the Pig
>>>>> output on
>>>>> HDFS to the HBase tables. Though this seems to be a very basic
>>>>> task, I could
>>>>> not find any easy way of doing that, except for writing some
>>>>> Java code. The
>>>>> problem is I'll have many different kind of output formats, and
>>>>> writing java
>>>>> code for loading each such file seems wrong. Probably I am missing
>>>>> something.
>>>>>
>>>>> Is there any way of storing Pig output directly in a Hbase table
>>>>> [loading is
>>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or
>>>>> is there any
>>>>> general data load/import tool for Hbase?
>>>>>
>>>>> Thanks!
>>>>> Nikhil Gupta
>>>>> Graduate Student,
>>>>> Stanford University
>>>>
Re: Storing Pig output into HBase tables
Posted by Vincent BARAT <vi...@ubikod.com>.
Thanks Alan,
I also definitively needs this functionality, and I plan to write it
soon. I was actually on the process of doing what you explained, but
I was blocked on the best way to specify the name of the HBase
table where to store the data (and also the associated storage
schema) using the "store A into B using C;" paradigm. Do you have
any recommendation about that ?
Alan Gates a écrit :
> In order to store information in HBase, you will need to use an
> OutputFormat that is HBase compatible. There exists a TableOutputFormat
> in Hbase that will write data. The trick is to get Pig to use that
> OutputFormat. It is possible, but Pig does not yet do a good job of
> making it easy.
>
> You will need to write a StoreFunc that returns TableOuputFormat from
> getStoragePreparationClass. You will then need to have the putNext call
> in StoreFunc write to TableOutputFormat's RecordWriter. For an example
> of how to do this, see
> contrib/zebra/src/java/org/apache/hadoop/zebra/pig/TableStorer.java in
> Pig's contrib directory.
>
> Alan.
>
> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
>
>> Hi, Alan. I am interest in this store function, could you mind sending
>> me some details?
>>
>> --------------------------------------------------
>> From: "Alan Gates" <ga...@yahoo-inc.com>
>> Sent: Thursday, September 10, 2009 4:34 AM
>> To: <pi...@hadoop.apache.org>
>> Subject: Re: Storing Pig output into HBase tables
>>
>>> I do not know if there is a general hbase load/import tool. That
>>> would be a good question for the hbase-user list.
>>>
>>> Right now Pig does not have a store function to write data into
>>> hbase. It is possible to write such a function. If you are
>>> interested I can send you specific details on how to do it.
>>>
>>> Alan.
>>>
>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am working no building a analytics kind of engine which takes
>>>> daily server
>>>> logs, crunches the data using Pig scripts and (for now) outputs
>>>> data to
>>>> HDFS. Later, this data is to be stored on HBase to enable efficient
>>>> querying
>>>> from front-end.
>>>>
>>>> Currently, I am searching for efficient ways of moving the Pig
>>>> output on
>>>> HDFS to the HBase tables. Though this seems to be a very basic
>>>> task, I could
>>>> not find any easy way of doing that, except for writing some Java
>>>> code. The
>>>> problem is I'll have many different kind of output formats, and
>>>> writing java
>>>> code for loading each such file seems wrong. Probably I am missing
>>>> something.
>>>>
>>>> Is there any way of storing Pig output directly in a Hbase table
>>>> [loading is
>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is
>>>> there any
>>>> general data load/import tool for Hbase?
>>>>
>>>> Thanks!
>>>> Nikhil Gupta
>>>> Graduate Student,
>>>> Stanford University
>>>
>
>
>
Re: Storing Pig output into HBase tables
Posted by Alan Gates <ga...@yahoo-inc.com>.
In order to store information in HBase, you will need to use an
OutputFormat that is HBase compatible. There exists a
TableOutputFormat in Hbase that will write data. The trick is to get
Pig to use that OutputFormat. It is possible, but Pig does not yet do
a good job of making it easy.
You will need to write a StoreFunc that returns TableOuputFormat from
getStoragePreparationClass. You will then need to have the putNext
call in StoreFunc write to TableOutputFormat's RecordWriter. For an
example of how to do this, see contrib/zebra/src/java/org/apache/
hadoop/zebra/pig/TableStorer.java in Pig's contrib directory.
Alan.
On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
> Hi, Alan. I am interest in this store function, could you mind
> sending me some details?
>
> --------------------------------------------------
> From: "Alan Gates" <ga...@yahoo-inc.com>
> Sent: Thursday, September 10, 2009 4:34 AM
> To: <pi...@hadoop.apache.org>
> Subject: Re: Storing Pig output into HBase tables
>
>> I do not know if there is a general hbase load/import tool. That
>> would be a good question for the hbase-user list.
>>
>> Right now Pig does not have a store function to write data into
>> hbase. It is possible to write such a function. If you are
>> interested I can send you specific details on how to do it.
>>
>> Alan.
>>
>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>
>>> Hi all,
>>>
>>> I am working no building a analytics kind of engine which takes
>>> daily server
>>> logs, crunches the data using Pig scripts and (for now) outputs
>>> data to
>>> HDFS. Later, this data is to be stored on HBase to enable
>>> efficient querying
>>> from front-end.
>>>
>>> Currently, I am searching for efficient ways of moving the Pig
>>> output on
>>> HDFS to the HBase tables. Though this seems to be a very basic
>>> task, I could
>>> not find any easy way of doing that, except for writing some Java
>>> code. The
>>> problem is I'll have many different kind of output formats, and
>>> writing java
>>> code for loading each such file seems wrong. Probably I am missing
>>> something.
>>>
>>> Is there any way of storing Pig output directly in a Hbase table
>>> [loading is
>>> possible by HBaseStorage, but that doesn't talk of storing]. Or
>>> is there any
>>> general data load/import tool for Hbase?
>>>
>>> Thanks!
>>> Nikhil Gupta
>>> Graduate Student,
>>> Stanford University
>>
Re: Storing Pig output into HBase tables
Posted by Nikhil Gupta <gu...@gmail.com>.
Thanks for your reply, Alan. Please send me the details too.
-nikhil
http://stanford.edu/~nikgupta
On Thu, Sep 10, 2009 at 6:50 AM, Liu Xianglong <sa...@hotmail.com>wrote:
> Hi, Alan. I am interest in this store function, could you mind sending me
> some details?
>
> --------------------------------------------------
> From: "Alan Gates" <ga...@yahoo-inc.com>
> Sent: Thursday, September 10, 2009 4:34 AM
> To: <pi...@hadoop.apache.org>
> Subject: Re: Storing Pig output into HBase tables
>
> I do not know if there is a general hbase load/import tool. That would
>> be a good question for the hbase-user list.
>>
>> Right now Pig does not have a store function to write data into hbase. It
>> is possible to write such a function. If you are interested I can send you
>> specific details on how to do it.
>>
>> Alan.
>>
>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>
>> Hi all,
>>>
>>> I am working no building a analytics kind of engine which takes daily
>>> server
>>> logs, crunches the data using Pig scripts and (for now) outputs data to
>>> HDFS. Later, this data is to be stored on HBase to enable efficient
>>> querying
>>> from front-end.
>>>
>>> Currently, I am searching for efficient ways of moving the Pig output on
>>> HDFS to the HBase tables. Though this seems to be a very basic task, I
>>> could
>>> not find any easy way of doing that, except for writing some Java code.
>>> The
>>> problem is I'll have many different kind of output formats, and writing
>>> java
>>> code for loading each such file seems wrong. Probably I am missing
>>> something.
>>>
>>> Is there any way of storing Pig output directly in a Hbase table [loading
>>> is
>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is there
>>> any
>>> general data load/import tool for Hbase?
>>>
>>> Thanks!
>>> Nikhil Gupta
>>> Graduate Student,
>>> Stanford University
>>>
>>
>>
>>
Re: Storing Pig output into HBase tables
Posted by Liu Xianglong <sa...@hotmail.com>.
Hi, Alan. I am interest in this store function, could you mind sending me
some details?
--------------------------------------------------
From: "Alan Gates" <ga...@yahoo-inc.com>
Sent: Thursday, September 10, 2009 4:34 AM
To: <pi...@hadoop.apache.org>
Subject: Re: Storing Pig output into HBase tables
> I do not know if there is a general hbase load/import tool. That would
> be a good question for the hbase-user list.
>
> Right now Pig does not have a store function to write data into hbase.
> It is possible to write such a function. If you are interested I can
> send you specific details on how to do it.
>
> Alan.
>
> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>
>> Hi all,
>>
>> I am working no building a analytics kind of engine which takes daily
>> server
>> logs, crunches the data using Pig scripts and (for now) outputs data to
>> HDFS. Later, this data is to be stored on HBase to enable efficient
>> querying
>> from front-end.
>>
>> Currently, I am searching for efficient ways of moving the Pig output on
>> HDFS to the HBase tables. Though this seems to be a very basic task, I
>> could
>> not find any easy way of doing that, except for writing some Java code.
>> The
>> problem is I'll have many different kind of output formats, and writing
>> java
>> code for loading each such file seems wrong. Probably I am missing
>> something.
>>
>> Is there any way of storing Pig output directly in a Hbase table
>> [loading is
>> possible by HBaseStorage, but that doesn't talk of storing]. Or is there
>> any
>> general data load/import tool for Hbase?
>>
>> Thanks!
>> Nikhil Gupta
>> Graduate Student,
>> Stanford University
>
>