You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by shazz Ng <sh...@gmail.com> on 2011/09/06 13:58:41 UTC

HbaseStorage / OpenTSDB queries issue

Hello,

First of all, I'm new at Pig and NoSQL so I hope you'll forgive stupid
questions ;-)

So, I'm playing with OpenTSDB (software layer on top of HBase to handle
timeseries data) and now I'd like to run some data mining queries on top of
my timestamped data. I found that Pig could be a solution so I tried to make
it working on top of the openTSDB data in hbase, it neraly works but I'm
still confused.

OpenTSDB schema :
hbase(main):011:0> describe 'tsdb-uid'
DESCRIPTION
                                                           ENABLED
 {NAME => 'tsdb-uid', FAMILIES => [{NAME => 'id', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS =>  true
 '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'name', BLOOMFILTER => 'NONE',
 REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BL
 OCKCACHE => 'true'}]}

hbase(main):012:0> describe 'tsdb'
DESCRIPTION
                                                           ENABLED
 {NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3',  true
 TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}]}

So sample uid data are :
hbase(main):014:0> scan 'tsdb-uid'
ROW                                                   COLUMN+CELL
 \x00\x00\x01                                         column=name:metrics,
timestamp=1314801674803, value=proc.loadavg.1m
 \x00\x00\x01                                         column=name:tagk,
timestamp=1314801684953, value=validity
 \x00\x00\x01                                         column=name:tagv,
timestamp=1314801685000, value=true
 \x00\x00\x02                                         column=name:metrics,
timestamp=1314801674849, value=proc.loadavg.5m
 \x00\x00\x02                                         column=name:tagk,
timestamp=1314801685049, value=device
 \x00\x00\x02                                         column=name:tagv,
timestamp=1314801685096, value=Device1
 \x00\x00\x03                                         column=name:metrics,
timestamp=1314801674898, value=Measurement_1
 \x00\x00\x03                                         column=name:tagk,
timestamp=1314801685144, value=accuracy
 \x00\x00\x03                                         column=name:tagv,
timestamp=1314801693030, value=Device2
 \x00\x00\x04                                         column=name:metrics,
timestamp=1314801674947, value=Measurement_2
 \x00\x00\x05                                         column=name:metrics,
timestamp=1314801674994, value=Measurement_3
 Device1                                              column=id:tagv,
timestamp=1314801685097, value=\x00\x00\x02
 Device2                                              column=id:tagv,
timestamp=1314801693031, value=\x00\x00\x03
 Measurement_1                                        column=id:metrics,
timestamp=1314801674899, value=\x00\x00\x03
 Measurement_2                                        column=id:metrics,
timestamp=1314801674948, value=\x00\x00\x04
 Measurement_3                                        column=id:metrics,
timestamp=1314801674995, value=\x00\x00\x05
 accuracy                                             column=id:tagk,
timestamp=1314801685145, value=\x00\x00\x03
 device                                               column=id:tagk,
timestamp=1314801685050, value=\x00\x00\x02
 proc.loadavg.1m                                      column=id:metrics,
timestamp=1314801674804, value=\x00\x00\x01
 proc.loadavg.5m                                      column=id:metrics,
timestamp=1314801674850, value=\x00\x00\x02
 true                                                 column=id:tagv,
timestamp=1314801685002, value=\x00\x00\x01
 validity                                             column=id:tagk,
timestamp=1314801684955, value=\x00\x00\x01

Here are the metrics (timestamp data type id:metrics) and the tag defining
the data (tagk and tagv for value, ex:  validity = true)

So from Pig when I want to retrieve only the metrics and their value (= id
for the data table) I do :
tsd_metrics     = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', '-loadKey
true') AS (metrics:bytearray);
dump tsd_metrics;

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt
 Features
0.20.2  0.8.1-SNAPSHOT  opentsdb        2011-09-06 13:39:27     2011-09-06
13:39:34     UNKNOWN
Success!
Job Stats (time in seconds):
JobId   Alias   Feature Outputs
job_local_0004  tsd_metrics     MAP_ONLY
 file:/tmp/temp-1850282462/tmp1589556736,
Input(s):
Successfully read records from: "hbase://tsdb-uid"
Output(s):
Successfully stored records in: "file:/tmp/temp-1850282462/tmp1589556736"
Job DAG:
job_local_0004

(Measurement_1,)
(Measurement_2,)
(Measurement_3,)
(proc.loadavg.1m,)
(proc.loadavg.5m,)

so that's nealy ok except that the value (= id) displayed is null instead
of \x00\x00\x03 for example in the case of Measurement_1

Any idea ?

thx !

shazz

Re: HbaseStorage / OpenTSDB queries issue

Posted by shazz Ng <sh...@gmail.com>.
Thanks Dmitriy !

Effectively it works using the caster AND (defining value OR metrics as
long)
grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:long);

I don't really understand why the HBaseStorage LoadFunc considers that
cf:qualifier == value but why not....I'll look in the code :)
I'll try to setup an easy way t oreproduce it and I'll jira it.

btw, I'm not sure I understood your last comment, how did you do to pull
bytearrays so ?

shazz


On Tue, Sep 6, 2011 at 7:10 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> (fwiw, HBaseStorage works fine for me when I use it to pull whole protocol
> buffer messages down as byte arrays)
>
> On Tue, Sep 6, 2011 at 10:10 AM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > That's interesting... we should be able to return a byte array properly
> > (though this is a bit risky for people who try to later turn this
> bytearray
> > into a long using Pig, since the conversion from bytes to longs in Pig is
> > different than in HBase).
> >
> > Could you guys open a jira, preferably with an easy way to reproduce the
> > error?
> >
> > D
> >
> >
> > On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <br...@tynt.com> wrote:
> >
> >> My load looks like this
> >>
> >> .... AS (key:chararray, value:long);
> >>
> >> and I'm able to return data.
> >>
> >> I changed the load to
> >>
> >> .... AS (key:chararray, value:bytearray);
> >>
> >> and had results that match yours.
> >>
> >> Try changing the value to long or int type and see if that helps.
> >>
> >> -bp
> >>
> >>
> >> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
> >>
> >> > the 'funny' thing is that if I look at the other CF name (from an byte
> >> id
> >> > gives the name, reverse way) :
> >> >
> >> > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> >> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> >> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> >> > metrics:bytearray);
> >> >
> >> > I've got the same issue:
> >> > (,proc.loadavg.1m)
> >> > (,proc.loadavg.5m)
> >> > (,Measurement_1)
> >> > (,Measurement_2)
> >> > (,Measurement_3)
> >> >
> >> > So there is a real issue with byte array....
> >> >
> >> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
> >> >
> >> > > Hello Bryce,
> >> > >
> >> > > not better... :-(
> >> > >
> >> > > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> >> > > metrics:bytearray);
> >> > > grunt> dump tsd_metrics2;
> >> > >
> >> > > [...]
> >> > >
> >> > > (Measurement_1,)
> >> > > (Measurement_2,)
> >> > > (Measurement_3,)
> >> > > (proc.loadavg.1m,)
> >> > > (proc.loadavg.5m,)
> >> > >
> >> > >
> >> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> >> > >
> >> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
> >> > >>
> >> > >> '-caster=HBaseBinaryConverter -loadKey=true'
> >> > >>
> >> > >> -bp
> >> > >>
> >> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com>
> wrote:
> >> > >>
> >> > >> > Hello Norbert,
> >> > >> >
> >> > >> > Unfortunately, same result :
> >> > >> > (Measurement_1,)
> >> > >> > (Measurement_2,)
> >> > >> > (Measurement_3,)
> >> > >> > (proc.loadavg.1m,)
> >> > >> > (proc.loadavg.5m,)
> >> > >> >
> >> > >> > the row key is well extracted (Measurement_1 for example) but the
> >> > value,
> >> > >> > the
> >> > >> > id I need for timestamp data querying, the bytearray, is not :(
> >> > >> >
> >> > >> > shazz
> >> > >> >
> >> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> >> > >> norbert.burger@gmail.com
> >> > >> > >wrote:
> >> > >> >
> >> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
> >> > wrote:
> >> > >> > > > So from Pig when I want to retrieve only the metrics and
> their
> >> > value
> >> > >> (=
> >> > >> > > id
> >> > >> > > > for the data table) I do :
> >> > >> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> >> > >> > > >
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > >> > '-loadKey
> >> > >> > > > true') AS (metrics:bytearray);
> >> > >> > > > dump tsd_metrics;
> >> > >> > >
> >> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
> >> your
> >> > >> > > LOAD schema includes an extra column containing the row key,
> and
> >> you
> >> > >> > > should add equivalent to your schema column mapping (the AS
> >> clause).
> >> > >> > > Try the following:
> >> > >> > >
> >> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >> > >> > >
> >> > >> > > Norbert
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
(fwiw, HBaseStorage works fine for me when I use it to pull whole protocol
buffer messages down as byte arrays)

On Tue, Sep 6, 2011 at 10:10 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> That's interesting... we should be able to return a byte array properly
> (though this is a bit risky for people who try to later turn this bytearray
> into a long using Pig, since the conversion from bytes to longs in Pig is
> different than in HBase).
>
> Could you guys open a jira, preferably with an easy way to reproduce the
> error?
>
> D
>
>
> On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <br...@tynt.com> wrote:
>
>> My load looks like this
>>
>> .... AS (key:chararray, value:long);
>>
>> and I'm able to return data.
>>
>> I changed the load to
>>
>> .... AS (key:chararray, value:bytearray);
>>
>> and had results that match yours.
>>
>> Try changing the value to long or int type and see if that helps.
>>
>> -bp
>>
>>
>> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
>>
>> > the 'funny' thing is that if I look at the other CF name (from an byte
>> id
>> > gives the name, reverse way) :
>> >
>> > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
>> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
>> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
>> > metrics:bytearray);
>> >
>> > I've got the same issue:
>> > (,proc.loadavg.1m)
>> > (,proc.loadavg.5m)
>> > (,Measurement_1)
>> > (,Measurement_2)
>> > (,Measurement_3)
>> >
>> > So there is a real issue with byte array....
>> >
>> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
>> >
>> > > Hello Bryce,
>> > >
>> > > not better... :-(
>> > >
>> > > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
>> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
>> > > metrics:bytearray);
>> > > grunt> dump tsd_metrics2;
>> > >
>> > > [...]
>> > >
>> > > (Measurement_1,)
>> > > (Measurement_2,)
>> > > (Measurement_3,)
>> > > (proc.loadavg.1m,)
>> > > (proc.loadavg.5m,)
>> > >
>> > >
>> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
>> > >
>> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
>> > >>
>> > >> '-caster=HBaseBinaryConverter -loadKey=true'
>> > >>
>> > >> -bp
>> > >>
>> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
>> > >>
>> > >> > Hello Norbert,
>> > >> >
>> > >> > Unfortunately, same result :
>> > >> > (Measurement_1,)
>> > >> > (Measurement_2,)
>> > >> > (Measurement_3,)
>> > >> > (proc.loadavg.1m,)
>> > >> > (proc.loadavg.5m,)
>> > >> >
>> > >> > the row key is well extracted (Measurement_1 for example) but the
>> > value,
>> > >> > the
>> > >> > id I need for timestamp data querying, the bytearray, is not :(
>> > >> >
>> > >> > shazz
>> > >> >
>> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
>> > >> norbert.burger@gmail.com
>> > >> > >wrote:
>> > >> >
>> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
>> > wrote:
>> > >> > > > So from Pig when I want to retrieve only the metrics and their
>> > value
>> > >> (=
>> > >> > > id
>> > >> > > > for the data table) I do :
>> > >> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
>> > >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > >> > '-loadKey
>> > >> > > > true') AS (metrics:bytearray);
>> > >> > > > dump tsd_metrics;
>> > >> > >
>> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
>> your
>> > >> > > LOAD schema includes an extra column containing the row key, and
>> you
>> > >> > > should add equivalent to your schema column mapping (the AS
>> clause).
>> > >> > > Try the following:
>> > >> > >
>> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
>> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
>> > >> > >
>> > >> > > Norbert
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
That's interesting... we should be able to return a byte array properly
(though this is a bit risky for people who try to later turn this bytearray
into a long using Pig, since the conversion from bytes to longs in Pig is
different than in HBase).

Could you guys open a jira, preferably with an easy way to reproduce the
error?

D

On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <br...@tynt.com> wrote:

> My load looks like this
>
> .... AS (key:chararray, value:long);
>
> and I'm able to return data.
>
> I changed the load to
>
> .... AS (key:chararray, value:bytearray);
>
> and had results that match yours.
>
> Try changing the value to long or int type and see if that helps.
>
> -bp
>
>
> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
>
> > the 'funny' thing is that if I look at the other CF name (from an byte id
> > gives the name, reverse way) :
> >
> > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> > metrics:bytearray);
> >
> > I've got the same issue:
> > (,proc.loadavg.1m)
> > (,proc.loadavg.5m)
> > (,Measurement_1)
> > (,Measurement_2)
> > (,Measurement_3)
> >
> > So there is a real issue with byte array....
> >
> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
> >
> > > Hello Bryce,
> > >
> > > not better... :-(
> > >
> > > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> > > metrics:bytearray);
> > > grunt> dump tsd_metrics2;
> > >
> > > [...]
> > >
> > > (Measurement_1,)
> > > (Measurement_2,)
> > > (Measurement_3,)
> > > (proc.loadavg.1m,)
> > > (proc.loadavg.5m,)
> > >
> > >
> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> > >
> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
> > >>
> > >> '-caster=HBaseBinaryConverter -loadKey=true'
> > >>
> > >> -bp
> > >>
> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
> > >>
> > >> > Hello Norbert,
> > >> >
> > >> > Unfortunately, same result :
> > >> > (Measurement_1,)
> > >> > (Measurement_2,)
> > >> > (Measurement_3,)
> > >> > (proc.loadavg.1m,)
> > >> > (proc.loadavg.5m,)
> > >> >
> > >> > the row key is well extracted (Measurement_1 for example) but the
> > value,
> > >> > the
> > >> > id I need for timestamp data querying, the bytearray, is not :(
> > >> >
> > >> > shazz
> > >> >
> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> > >> norbert.burger@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
> > wrote:
> > >> > > > So from Pig when I want to retrieve only the metrics and their
> > value
> > >> (=
> > >> > > id
> > >> > > > for the data table) I do :
> > >> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> > >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > >> > '-loadKey
> > >> > > > true') AS (metrics:bytearray);
> > >> > > > dump tsd_metrics;
> > >> > >
> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
> your
> > >> > > LOAD schema includes an extra column containing the row key, and
> you
> > >> > > should add equivalent to your schema column mapping (the AS
> clause).
> > >> > > Try the following:
> > >> > >
> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> > >> > >
> > >> > > Norbert
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by Bryce Poole <br...@tynt.com>.
My load looks like this

.... AS (key:chararray, value:long);

and I'm able to return data.

I changed the load to

.... AS (key:chararray, value:bytearray);

and had results that match yours.

Try changing the value to long or int type and see if that helps.

-bp


On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:

> the 'funny' thing is that if I look at the other CF name (from an byte id
> gives the name, reverse way) :
>
> grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> metrics:bytearray);
>
> I've got the same issue:
> (,proc.loadavg.1m)
> (,proc.loadavg.5m)
> (,Measurement_1)
> (,Measurement_2)
> (,Measurement_3)
>
> So there is a real issue with byte array....
>
> On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
>
> > Hello Bryce,
> >
> > not better... :-(
> >
> > grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> > metrics:bytearray);
> > grunt> dump tsd_metrics2;
> >
> > [...]
> >
> > (Measurement_1,)
> > (Measurement_2,)
> > (Measurement_3,)
> > (proc.loadavg.1m,)
> > (proc.loadavg.5m,)
> >
> >
> > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> >
> >> Try adding -caster=HBaseBinaryConverter along with loadKey
> >>
> >> '-caster=HBaseBinaryConverter -loadKey=true'
> >>
> >> -bp
> >>
> >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
> >>
> >> > Hello Norbert,
> >> >
> >> > Unfortunately, same result :
> >> > (Measurement_1,)
> >> > (Measurement_2,)
> >> > (Measurement_3,)
> >> > (proc.loadavg.1m,)
> >> > (proc.loadavg.5m,)
> >> >
> >> > the row key is well extracted (Measurement_1 for example) but the
> value,
> >> > the
> >> > id I need for timestamp data querying, the bytearray, is not :(
> >> >
> >> > shazz
> >> >
> >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> >> norbert.burger@gmail.com
> >> > >wrote:
> >> >
> >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
> wrote:
> >> > > > So from Pig when I want to retrieve only the metrics and their
> value
> >> (=
> >> > > id
> >> > > > for the data table) I do :
> >> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > '-loadKey
> >> > > > true') AS (metrics:bytearray);
> >> > > > dump tsd_metrics;
> >> > >
> >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> >> > > LOAD schema includes an extra column containing the row key, and you
> >> > > should add equivalent to your schema column mapping (the AS clause).
> >> > > Try the following:
> >> > >
> >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >> > >
> >> > > Norbert
> >> > >
> >> >
> >>
> >
> >
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by shazz Ng <sh...@gmail.com>.
the 'funny' thing is that if I look at the other CF name (from an byte id
gives the name, reverse way) :

grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:bytearray);

I've got the same issue:
(,proc.loadavg.1m)
(,proc.loadavg.5m)
(,Measurement_1)
(,Measurement_2)
(,Measurement_3)

So there is a real issue with byte array....

On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:

> Hello Bryce,
>
> not better... :-(
>
> grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> metrics:bytearray);
> grunt> dump tsd_metrics2;
>
> [...]
>
> (Measurement_1,)
> (Measurement_2,)
> (Measurement_3,)
> (proc.loadavg.1m,)
> (proc.loadavg.5m,)
>
>
> On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
>
>> Try adding -caster=HBaseBinaryConverter along with loadKey
>>
>> '-caster=HBaseBinaryConverter -loadKey=true'
>>
>> -bp
>>
>> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
>>
>> > Hello Norbert,
>> >
>> > Unfortunately, same result :
>> > (Measurement_1,)
>> > (Measurement_2,)
>> > (Measurement_3,)
>> > (proc.loadavg.1m,)
>> > (proc.loadavg.5m,)
>> >
>> > the row key is well extracted (Measurement_1 for example) but the value,
>> > the
>> > id I need for timestamp data querying, the bytearray, is not :(
>> >
>> > shazz
>> >
>> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
>> norbert.burger@gmail.com
>> > >wrote:
>> >
>> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
>> > > > So from Pig when I want to retrieve only the metrics and their value
>> (=
>> > > id
>> > > > for the data table) I do :
>> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
>> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > '-loadKey
>> > > > true') AS (metrics:bytearray);
>> > > > dump tsd_metrics;
>> > >
>> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
>> > > LOAD schema includes an extra column containing the row key, and you
>> > > should add equivalent to your schema column mapping (the AS clause).
>> > > Try the following:
>> > >
>> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
>> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
>> > >
>> > > Norbert
>> > >
>> >
>>
>
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by shazz Ng <sh...@gmail.com>.
Hello Bryce,

not better... :-(

grunt> tsd_metrics2     = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:bytearray);
grunt> dump tsd_metrics2;

[...]

(Measurement_1,)
(Measurement_2,)
(Measurement_3,)
(proc.loadavg.1m,)
(proc.loadavg.5m,)


On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:

> Try adding -caster=HBaseBinaryConverter along with loadKey
>
> '-caster=HBaseBinaryConverter -loadKey=true'
>
> -bp
>
> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
>
> > Hello Norbert,
> >
> > Unfortunately, same result :
> > (Measurement_1,)
> > (Measurement_2,)
> > (Measurement_3,)
> > (proc.loadavg.1m,)
> > (proc.loadavg.5m,)
> >
> > the row key is well extracted (Measurement_1 for example) but the value,
> > the
> > id I need for timestamp data querying, the bytearray, is not :(
> >
> > shazz
> >
> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <norbert.burger@gmail.com
> > >wrote:
> >
> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> > > > So from Pig when I want to retrieve only the metrics and their value
> (=
> > > id
> > > > for the data table) I do :
> > > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > '-loadKey
> > > > true') AS (metrics:bytearray);
> > > > dump tsd_metrics;
> > >
> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> > > LOAD schema includes an extra column containing the row key, and you
> > > should add equivalent to your schema column mapping (the AS clause).
> > > Try the following:
> > >
> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> > >
> > > Norbert
> > >
> >
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by Bryce Poole <br...@tynt.com>.
Try adding -caster=HBaseBinaryConverter along with loadKey

'-caster=HBaseBinaryConverter -loadKey=true'

-bp

On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:

> Hello Norbert,
>
> Unfortunately, same result :
> (Measurement_1,)
> (Measurement_2,)
> (Measurement_3,)
> (proc.loadavg.1m,)
> (proc.loadavg.5m,)
>
> the row key is well extracted (Measurement_1 for example) but the value,
> the
> id I need for timestamp data querying, the bytearray, is not :(
>
> shazz
>
> On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <norbert.burger@gmail.com
> >wrote:
>
> > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> > > So from Pig when I want to retrieve only the metrics and their value (=
> > id
> > > for the data table) I do :
> > > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> '-loadKey
> > > true') AS (metrics:bytearray);
> > > dump tsd_metrics;
> >
> > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> > LOAD schema includes an extra column containing the row key, and you
> > should add equivalent to your schema column mapping (the AS clause).
> > Try the following:
> >
> > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >
> > Norbert
> >
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by shazz Ng <sh...@gmail.com>.
Hello Norbert,

Unfortunately, same result :
(Measurement_1,)
(Measurement_2,)
(Measurement_3,)
(proc.loadavg.1m,)
(proc.loadavg.5m,)

the row key is well extracted (Measurement_1 for example) but the value, the
id I need for timestamp data querying, the bytearray, is not :(

shazz

On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <no...@gmail.com>wrote:

> On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> > So from Pig when I want to retrieve only the metrics and their value (=
> id
> > for the data table) I do :
> > tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', '-loadKey
> > true') AS (metrics:bytearray);
> > dump tsd_metrics;
>
> Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> LOAD schema includes an extra column containing the row key, and you
> should add equivalent to your schema column mapping (the AS clause).
> Try the following:
>
> tsd_metrics = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> '-loadKey true') AS (key:bytearray, metrics:bytearray);
>
> Norbert
>

Re: HbaseStorage / OpenTSDB queries issue

Posted by Norbert Burger <no...@gmail.com>.
On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> So from Pig when I want to retrieve only the metrics and their value (= id
> for the data table) I do :
> tsd_metrics     = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', '-loadKey
> true') AS (metrics:bytearray);
> dump tsd_metrics;

Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
LOAD schema includes an extra column containing the row key, and you
should add equivalent to your schema column mapping (the AS clause).
Try the following:

tsd_metrics = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-loadKey true') AS (key:bytearray, metrics:bytearray);

Norbert