You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jerry Lam <ch...@gmail.com> on 2013/11/27 17:57:49 UTC

Storing tuple into HBaseStorage

Hello Pig users,

I want to store the entire tuple into hbase from Pig using HBaseStorage.
I know that I can do something like:

output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
STORE output INTO 'hbase://outputtable' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 ..... f1:cN');

Since the output contains tuples of 100 fields, I don't want to write them
manually. Additionally, I want to use the alias name of the field as the
column name for hbase. Since the entire tuple goes into the same column
family, I wonder if there is an easy way to express this in Pig?

Thank you,

Jerry

Re: Storing tuple into HBaseStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Shawn,

I see your point now. Thank you for your help!

Jerry


On Wed, Nov 27, 2013 at 6:14 PM, Shawn Hermans <sh...@gmail.com>wrote:

> That is a very good question.  I am not sure if there is an easy way to use
> the alias of the field as the key. I looked at the Tuple class definition (
> http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/data/Tuple.html) and
> it appears it does not give an option to get the name associated with a
> particular tuple field.
>
> One potential workaround to this issue is to define a simple UDF.  I
> provided a quick, untested Jython UDF as an example at
> https://gist.github.com/shawnhermans/7684660.   It hard codes the field
> names as a part of the UDF, but you could add a second argument to the
> function allowing it to pass in field names.
>
> -Shawn
>
>
> On Wed, Nov 27, 2013 at 12:27 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Shawn,
> >
> > Thanks for the advice.
> >
> > Can TOMAP generate a map from tuple using the alias of the field in the
> > tuple as the key of the map and the field value as the value of the map?
> > Form the documentation, TOMAP syntax is:
> >
> > TOMAP(key-expression, value-expression [, key-expression,
> value-expression
> > ...])
> >
> > It does not look like it can use the alias of the field as the key... Any
> > further advice? Thanks!
> >
> > Jerry
> >
> >
> > On Wed, Nov 27, 2013 at 12:09 PM, Shawn Hermans <shawnhermans@gmail.com
> > >wrote:
> >
> > > You should be able to use a Pig map to do this.  Use the column name as
> > the
> > > key in the map and the value as the value.  You should be able to use
> the
> > > builtin TOMAP function to generate the map (
> > > http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
> > > documentation gives an example of storing a map using friends:* and
> > info:*
> > > as the column families.
> > >
> > >
> > >
> >
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > >
> > > copy = STORE raw INTO 'hbase://SampleTableCopy'
> > >        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> > >        'info:first_name info:last_name friends:* info:*');
> > >
> > >
> > >
> > >
> > > On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <ch...@gmail.com>
> > wrote:
> > >
> > > > Hello Pig users,
> > > >
> > > > I want to store the entire tuple into hbase from Pig using
> > HBaseStorage.
> > > > I know that I can do something like:
> > > >
> > > > output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> > > > STORE output INTO 'hbase://outputtable' USING
> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> > > > f1:cN');
> > > >
> > > > Since the output contains tuples of 100 fields, I don't want to write
> > > them
> > > > manually. Additionally, I want to use the alias name of the field as
> > the
> > > > column name for hbase. Since the entire tuple goes into the same
> column
> > > > family, I wonder if there is an easy way to express this in Pig?
> > > >
> > > > Thank you,
> > > >
> > > > Jerry
> > > >
> > >
> >
>

Re: Storing tuple into HBaseStorage

Posted by Shawn Hermans <sh...@gmail.com>.
That is a very good question.  I am not sure if there is an easy way to use
the alias of the field as the key. I looked at the Tuple class definition (
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/data/Tuple.html) and
it appears it does not give an option to get the name associated with a
particular tuple field.

One potential workaround to this issue is to define a simple UDF.  I
provided a quick, untested Jython UDF as an example at
https://gist.github.com/shawnhermans/7684660.   It hard codes the field
names as a part of the UDF, but you could add a second argument to the
function allowing it to pass in field names.

-Shawn


On Wed, Nov 27, 2013 at 12:27 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Shawn,
>
> Thanks for the advice.
>
> Can TOMAP generate a map from tuple using the alias of the field in the
> tuple as the key of the map and the field value as the value of the map?
> Form the documentation, TOMAP syntax is:
>
> TOMAP(key-expression, value-expression [, key-expression, value-expression
> ...])
>
> It does not look like it can use the alias of the field as the key... Any
> further advice? Thanks!
>
> Jerry
>
>
> On Wed, Nov 27, 2013 at 12:09 PM, Shawn Hermans <shawnhermans@gmail.com
> >wrote:
>
> > You should be able to use a Pig map to do this.  Use the column name as
> the
> > key in the map and the value as the value.  You should be able to use the
> > builtin TOMAP function to generate the map (
> > http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
> > documentation gives an example of storing a map using friends:* and
> info:*
> > as the column families.
> >
> >
> >
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> >
> > copy = STORE raw INTO 'hbase://SampleTableCopy'
> >        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> >        'info:first_name info:last_name friends:* info:*');
> >
> >
> >
> >
> > On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <ch...@gmail.com>
> wrote:
> >
> > > Hello Pig users,
> > >
> > > I want to store the entire tuple into hbase from Pig using
> HBaseStorage.
> > > I know that I can do something like:
> > >
> > > output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> > > STORE output INTO 'hbase://outputtable' USING
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> > > f1:cN');
> > >
> > > Since the output contains tuples of 100 fields, I don't want to write
> > them
> > > manually. Additionally, I want to use the alias name of the field as
> the
> > > column name for hbase. Since the entire tuple goes into the same column
> > > family, I wonder if there is an easy way to express this in Pig?
> > >
> > > Thank you,
> > >
> > > Jerry
> > >
> >
>

Re: Storing tuple into HBaseStorage

Posted by Jerry Lam <ch...@gmail.com>.
Hi Shawn,

Thanks for the advice.

Can TOMAP generate a map from tuple using the alias of the field in the
tuple as the key of the map and the field value as the value of the map?
Form the documentation, TOMAP syntax is:

TOMAP(key-expression, value-expression [, key-expression, value-expression
...])

It does not look like it can use the alias of the field as the key... Any
further advice? Thanks!

Jerry


On Wed, Nov 27, 2013 at 12:09 PM, Shawn Hermans <sh...@gmail.com>wrote:

> You should be able to use a Pig map to do this.  Use the column name as the
> key in the map and the value as the value.  You should be able to use the
> builtin TOMAP function to generate the map (
> http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
> documentation gives an example of storing a map using friends:* and info:*
> as the column families.
>
>
> http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
>
> copy = STORE raw INTO 'hbase://SampleTableCopy'
>        USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>        'info:first_name info:last_name friends:* info:*');
>
>
>
>
> On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hello Pig users,
> >
> > I want to store the entire tuple into hbase from Pig using HBaseStorage.
> > I know that I can do something like:
> >
> > output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> > STORE output INTO 'hbase://outputtable' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> > f1:cN');
> >
> > Since the output contains tuples of 100 fields, I don't want to write
> them
> > manually. Additionally, I want to use the alias name of the field as the
> > column name for hbase. Since the entire tuple goes into the same column
> > family, I wonder if there is an easy way to express this in Pig?
> >
> > Thank you,
> >
> > Jerry
> >
>

Re: Storing tuple into HBaseStorage

Posted by Shawn Hermans <sh...@gmail.com>.
You should be able to use a Pig map to do this.  Use the column name as the
key in the map and the value as the value.  You should be able to use the
builtin TOMAP function to generate the map (
http://pig.apache.org/docs/r0.11.0/func.html#tomap).  The HBaseStorage
documentation gives an example of storing a map using friends:* and info:*
as the column families.

http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

copy = STORE raw INTO 'hbase://SampleTableCopy'
       USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
       'info:first_name info:last_name friends:* info:*');




On Wed, Nov 27, 2013 at 10:57 AM, Jerry Lam <ch...@gmail.com> wrote:

> Hello Pig users,
>
> I want to store the entire tuple into hbase from Pig using HBaseStorage.
> I know that I can do something like:
>
> output = .... as (c1:bytearray, c2:bytearray, .... cN:bytearray);
> STORE output INTO 'hbase://outputtable' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('f1:c1 f1:c2 .....
> f1:cN');
>
> Since the output contains tuples of 100 fields, I don't want to write them
> manually. Additionally, I want to use the alias name of the field as the
> column name for hbase. Since the entire tuple goes into the same column
> family, I wonder if there is an easy way to express this in Pig?
>
> Thank you,
>
> Jerry
>