You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Michael Dalton <mw...@gmail.com> on 2010/01/15 06:59:07 UTC

reading/writing HBase in Pig

Hi all,

I was looking at the current Pig code in SVN, and it seems like HBase is
supported for loading, but not for storing. If this is the case, I'd like to
add support for writing to HBase to Pig. Is there anyone else working on
this, and if not is this something that you'd like contributed? Based on a
cursory evaluation of the StoreFunc interface, it looks like the APIs there
are pretty file-centric and may need to be modified to accomodate HBase's
table-based design. For example, you aren't going to be serializing your
output to an OutputStream object in all likelihood.

I haven't contributed to Pig before, and I wanted to see if this is
something that would be beneficial to the rest of the Pig community, and if
so what next steps I should take (like starting a JIRA) to get the ball
rolling. Thanks

Best regards,

Mike

Re: reading/writing HBase in Pig

Posted by Jeff Zhang <zj...@gmail.com>.
The Pig-1200 only support using InputFormat now, the other features: load
row key and store to hbase has not been supported, I will continue the
remaining work.



On Mon, Jan 25, 2010 at 11:13 AM, Alan Gates <ga...@yahoo-inc.com> wrote:

>
> On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote:
>
>  I took a look at the load-store branch and that definitely seems like the
>> right place to do this. So the right thing to do would be to just open up
>> a
>> JIRA and then post a patch against the load-store rewrite tree, correct?
>>
>
> Yes.  You should take a look at PIG-1200, which seems to be going part way
> towards doing what you want to do.
>
> Alan.
>
>
>


-- 
Best Regards

Jeff Zhang

Re: reading/writing HBase in Pig

Posted by Alan Gates <ga...@yahoo-inc.com>.
On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote:

> I took a look at the load-store branch and that definitely seems  
> like the
> right place to do this. So the right thing to do would be to just  
> open up a
> JIRA and then post a patch against the load-store rewrite tree,  
> correct?

Yes.  You should take a look at PIG-1200, which seems to be going part  
way towards doing what you want to do.

Alan.



Re: reading/writing HBase in Pig

Posted by Michael Dalton <mw...@gmail.com>.
I took a look at the load-store branch and that definitely seems like the
right place to do this. So the right thing to do would be to just open up a
JIRA and then post a patch against the load-store rewrite tree, correct?
Also, it seems to be that there's no existing support for row keys, which
should also be fixed. The current HBaseStorage assumes that the user passes
a list of columns (i.e. column family/qualifier pairs). However, users may
encode data in the HBase row key as well -- empty row keys are forbidden, so
there is definitely data there.

Doing any sort of StoreFunc implementation of HBase will require row key
support, as each Put must hav ea row key, so it looks like what I'll be
doing is modifying HBaseStorage's LoadFunc support to support row keys in
addition to the existing support for column values, and then adding support
for StoreFunc (with row keys) to HBaseStorage. Just wanted to make sure this
sounds good. Thanks

Best regards,

Mike

On Thu, Jan 14, 2010 at 10:40 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Hi Mike,
> It would be great to have a StoreFunc for HBase!
> There is  a rewrite underway for the Load/Store stuff that will make
> that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
> .  You may want to consider writing it for the load-store redesign
> branch.  This is what's probably going to be in 0.7. The first step
> would be to open a jira and look at the existing StoreFunc
> implementations.
>
> -D
>
> On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton <mw...@gmail.com>
> wrote:
> > Hi all,
> >
> > I was looking at the current Pig code in SVN, and it seems like HBase is
> > supported for loading, but not for storing. If this is the case, I'd like
> to
> > add support for writing to HBase to Pig. Is there anyone else working on
> > this, and if not is this something that you'd like contributed? Based on
> a
> > cursory evaluation of the StoreFunc interface, it looks like the APIs
> there
> > are pretty file-centric and may need to be modified to accomodate HBase's
> > table-based design. For example, you aren't going to be serializing your
> > output to an OutputStream object in all likelihood.
> >
> > I haven't contributed to Pig before, and I wanted to see if this is
> > something that would be beneficial to the rest of the Pig community, and
> if
> > so what next steps I should take (like starting a JIRA) to get the ball
> > rolling. Thanks
> >
> > Best regards,
> >
> > Mike
> >
>

Re: reading/writing HBase in Pig

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Hi Mike,
It would be great to have a StoreFunc for HBase!
There is  a rewrite underway for the Load/Store stuff that will make
that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
.  You may want to consider writing it for the load-store redesign
branch.  This is what's probably going to be in 0.7. The first step
would be to open a jira and look at the existing StoreFunc
implementations.

-D

On Thu, Jan 14, 2010 at 9:59 PM, Michael Dalton <mw...@gmail.com> wrote:
> Hi all,
>
> I was looking at the current Pig code in SVN, and it seems like HBase is
> supported for loading, but not for storing. If this is the case, I'd like to
> add support for writing to HBase to Pig. Is there anyone else working on
> this, and if not is this something that you'd like contributed? Based on a
> cursory evaluation of the StoreFunc interface, it looks like the APIs there
> are pretty file-centric and may need to be modified to accomodate HBase's
> table-based design. For example, you aren't going to be serializing your
> output to an OutputStream object in all likelihood.
>
> I haven't contributed to Pig before, and I wanted to see if this is
> something that would be beneficial to the rest of the Pig community, and if
> so what next steps I should take (like starting a JIRA) to get the ball
> rolling. Thanks
>
> Best regards,
>
> Mike
>