You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by shazz Ng <sh...@gmail.com> on 2011/09/06 13:58:41 UTC
HbaseStorage / OpenTSDB queries issue
Hello,
First of all, I'm new at Pig and NoSQL so I hope you'll forgive stupid
questions ;-)
So, I'm playing with OpenTSDB (software layer on top of HBase to handle
timeseries data) and now I'd like to run some data mining queries on top of
my timestamped data. I found that Pig could be a solution so I tried to make
it working on top of the openTSDB data in hbase, it neraly works but I'm
still confused.
OpenTSDB schema :
hbase(main):011:0> describe 'tsdb-uid'
DESCRIPTION
ENABLED
{NAME => 'tsdb-uid', FAMILIES => [{NAME => 'id', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => true
'3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}, {NAME => 'name', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL =>
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BL
OCKCACHE => 'true'}]}
hbase(main):012:0> describe 'tsdb'
DESCRIPTION
ENABLED
{NAME => 'tsdb', FAMILIES => [{NAME => 't', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', true
TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}]}
So sample uid data are :
hbase(main):014:0> scan 'tsdb-uid'
ROW COLUMN+CELL
\x00\x00\x01 column=name:metrics,
timestamp=1314801674803, value=proc.loadavg.1m
\x00\x00\x01 column=name:tagk,
timestamp=1314801684953, value=validity
\x00\x00\x01 column=name:tagv,
timestamp=1314801685000, value=true
\x00\x00\x02 column=name:metrics,
timestamp=1314801674849, value=proc.loadavg.5m
\x00\x00\x02 column=name:tagk,
timestamp=1314801685049, value=device
\x00\x00\x02 column=name:tagv,
timestamp=1314801685096, value=Device1
\x00\x00\x03 column=name:metrics,
timestamp=1314801674898, value=Measurement_1
\x00\x00\x03 column=name:tagk,
timestamp=1314801685144, value=accuracy
\x00\x00\x03 column=name:tagv,
timestamp=1314801693030, value=Device2
\x00\x00\x04 column=name:metrics,
timestamp=1314801674947, value=Measurement_2
\x00\x00\x05 column=name:metrics,
timestamp=1314801674994, value=Measurement_3
Device1 column=id:tagv,
timestamp=1314801685097, value=\x00\x00\x02
Device2 column=id:tagv,
timestamp=1314801693031, value=\x00\x00\x03
Measurement_1 column=id:metrics,
timestamp=1314801674899, value=\x00\x00\x03
Measurement_2 column=id:metrics,
timestamp=1314801674948, value=\x00\x00\x04
Measurement_3 column=id:metrics,
timestamp=1314801674995, value=\x00\x00\x05
accuracy column=id:tagk,
timestamp=1314801685145, value=\x00\x00\x03
device column=id:tagk,
timestamp=1314801685050, value=\x00\x00\x02
proc.loadavg.1m column=id:metrics,
timestamp=1314801674804, value=\x00\x00\x01
proc.loadavg.5m column=id:metrics,
timestamp=1314801674850, value=\x00\x00\x02
true column=id:tagv,
timestamp=1314801685002, value=\x00\x00\x01
validity column=id:tagk,
timestamp=1314801684955, value=\x00\x00\x01
Here are the metrics (timestamp data type id:metrics) and the tag defining
the data (tagk and tagv for value, ex: validity = true)
So from Pig when I want to retrieve only the metrics and their value (= id
for the data table) I do :
tsd_metrics = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', '-loadKey
true') AS (metrics:bytearray);
dump tsd_metrics;
HadoopVersion PigVersion UserId StartedAt FinishedAt
Features
0.20.2 0.8.1-SNAPSHOT opentsdb 2011-09-06 13:39:27 2011-09-06
13:39:34 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0004 tsd_metrics MAP_ONLY
file:/tmp/temp-1850282462/tmp1589556736,
Input(s):
Successfully read records from: "hbase://tsdb-uid"
Output(s):
Successfully stored records in: "file:/tmp/temp-1850282462/tmp1589556736"
Job DAG:
job_local_0004
(Measurement_1,)
(Measurement_2,)
(Measurement_3,)
(proc.loadavg.1m,)
(proc.loadavg.5m,)
so that's nealy ok except that the value (= id) displayed is null instead
of \x00\x00\x03 for example in the case of Measurement_1
Any idea ?
thx !
shazz
Re: HbaseStorage / OpenTSDB queries issue
Posted by shazz Ng <sh...@gmail.com>.
Thanks Dmitriy !
Effectively it works using the caster AND (defining value OR metrics as
long)
grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:long);
I don't really understand why the HBaseStorage LoadFunc considers that
cf:qualifier == value but why not....I'll look in the code :)
I'll try to setup an easy way t oreproduce it and I'll jira it.
btw, I'm not sure I understood your last comment, how did you do to pull
bytearrays so ?
shazz
On Tue, Sep 6, 2011 at 7:10 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> (fwiw, HBaseStorage works fine for me when I use it to pull whole protocol
> buffer messages down as byte arrays)
>
> On Tue, Sep 6, 2011 at 10:10 AM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > That's interesting... we should be able to return a byte array properly
> > (though this is a bit risky for people who try to later turn this
> bytearray
> > into a long using Pig, since the conversion from bytes to longs in Pig is
> > different than in HBase).
> >
> > Could you guys open a jira, preferably with an easy way to reproduce the
> > error?
> >
> > D
> >
> >
> > On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <br...@tynt.com> wrote:
> >
> >> My load looks like this
> >>
> >> .... AS (key:chararray, value:long);
> >>
> >> and I'm able to return data.
> >>
> >> I changed the load to
> >>
> >> .... AS (key:chararray, value:bytearray);
> >>
> >> and had results that match yours.
> >>
> >> Try changing the value to long or int type and see if that helps.
> >>
> >> -bp
> >>
> >>
> >> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
> >>
> >> > the 'funny' thing is that if I look at the other CF name (from an byte
> >> id
> >> > gives the name, reverse way) :
> >> >
> >> > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> >> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> >> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> >> > metrics:bytearray);
> >> >
> >> > I've got the same issue:
> >> > (,proc.loadavg.1m)
> >> > (,proc.loadavg.5m)
> >> > (,Measurement_1)
> >> > (,Measurement_2)
> >> > (,Measurement_3)
> >> >
> >> > So there is a real issue with byte array....
> >> >
> >> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
> >> >
> >> > > Hello Bryce,
> >> > >
> >> > > not better... :-(
> >> > >
> >> > > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> >> > > metrics:bytearray);
> >> > > grunt> dump tsd_metrics2;
> >> > >
> >> > > [...]
> >> > >
> >> > > (Measurement_1,)
> >> > > (Measurement_2,)
> >> > > (Measurement_3,)
> >> > > (proc.loadavg.1m,)
> >> > > (proc.loadavg.5m,)
> >> > >
> >> > >
> >> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> >> > >
> >> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
> >> > >>
> >> > >> '-caster=HBaseBinaryConverter -loadKey=true'
> >> > >>
> >> > >> -bp
> >> > >>
> >> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com>
> wrote:
> >> > >>
> >> > >> > Hello Norbert,
> >> > >> >
> >> > >> > Unfortunately, same result :
> >> > >> > (Measurement_1,)
> >> > >> > (Measurement_2,)
> >> > >> > (Measurement_3,)
> >> > >> > (proc.loadavg.1m,)
> >> > >> > (proc.loadavg.5m,)
> >> > >> >
> >> > >> > the row key is well extracted (Measurement_1 for example) but the
> >> > value,
> >> > >> > the
> >> > >> > id I need for timestamp data querying, the bytearray, is not :(
> >> > >> >
> >> > >> > shazz
> >> > >> >
> >> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> >> > >> norbert.burger@gmail.com
> >> > >> > >wrote:
> >> > >> >
> >> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
> >> > wrote:
> >> > >> > > > So from Pig when I want to retrieve only the metrics and
> their
> >> > value
> >> > >> (=
> >> > >> > > id
> >> > >> > > > for the data table) I do :
> >> > >> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > >> > > >
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > >> > '-loadKey
> >> > >> > > > true') AS (metrics:bytearray);
> >> > >> > > > dump tsd_metrics;
> >> > >> > >
> >> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
> >> your
> >> > >> > > LOAD schema includes an extra column containing the row key,
> and
> >> you
> >> > >> > > should add equivalent to your schema column mapping (the AS
> >> clause).
> >> > >> > > Try the following:
> >> > >> > >
> >> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >> > >> > >
> >> > >> > > Norbert
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
(fwiw, HBaseStorage works fine for me when I use it to pull whole protocol
buffer messages down as byte arrays)
On Tue, Sep 6, 2011 at 10:10 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> That's interesting... we should be able to return a byte array properly
> (though this is a bit risky for people who try to later turn this bytearray
> into a long using Pig, since the conversion from bytes to longs in Pig is
> different than in HBase).
>
> Could you guys open a jira, preferably with an easy way to reproduce the
> error?
>
> D
>
>
> On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <br...@tynt.com> wrote:
>
>> My load looks like this
>>
>> .... AS (key:chararray, value:long);
>>
>> and I'm able to return data.
>>
>> I changed the load to
>>
>> .... AS (key:chararray, value:bytearray);
>>
>> and had results that match yours.
>>
>> Try changing the value to long or int type and see if that helps.
>>
>> -bp
>>
>>
>> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
>>
>> > the 'funny' thing is that if I look at the other CF name (from an byte
>> id
>> > gives the name, reverse way) :
>> >
>> > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
>> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
>> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
>> > metrics:bytearray);
>> >
>> > I've got the same issue:
>> > (,proc.loadavg.1m)
>> > (,proc.loadavg.5m)
>> > (,Measurement_1)
>> > (,Measurement_2)
>> > (,Measurement_3)
>> >
>> > So there is a real issue with byte array....
>> >
>> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
>> >
>> > > Hello Bryce,
>> > >
>> > > not better... :-(
>> > >
>> > > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
>> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
>> > > metrics:bytearray);
>> > > grunt> dump tsd_metrics2;
>> > >
>> > > [...]
>> > >
>> > > (Measurement_1,)
>> > > (Measurement_2,)
>> > > (Measurement_3,)
>> > > (proc.loadavg.1m,)
>> > > (proc.loadavg.5m,)
>> > >
>> > >
>> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
>> > >
>> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
>> > >>
>> > >> '-caster=HBaseBinaryConverter -loadKey=true'
>> > >>
>> > >> -bp
>> > >>
>> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
>> > >>
>> > >> > Hello Norbert,
>> > >> >
>> > >> > Unfortunately, same result :
>> > >> > (Measurement_1,)
>> > >> > (Measurement_2,)
>> > >> > (Measurement_3,)
>> > >> > (proc.loadavg.1m,)
>> > >> > (proc.loadavg.5m,)
>> > >> >
>> > >> > the row key is well extracted (Measurement_1 for example) but the
>> > value,
>> > >> > the
>> > >> > id I need for timestamp data querying, the bytearray, is not :(
>> > >> >
>> > >> > shazz
>> > >> >
>> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
>> > >> norbert.burger@gmail.com
>> > >> > >wrote:
>> > >> >
>> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
>> > wrote:
>> > >> > > > So from Pig when I want to retrieve only the metrics and their
>> > value
>> > >> (=
>> > >> > > id
>> > >> > > > for the data table) I do :
>> > >> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
>> > >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > >> > '-loadKey
>> > >> > > > true') AS (metrics:bytearray);
>> > >> > > > dump tsd_metrics;
>> > >> > >
>> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
>> your
>> > >> > > LOAD schema includes an extra column containing the row key, and
>> you
>> > >> > > should add equivalent to your schema column mapping (the AS
>> clause).
>> > >> > > Try the following:
>> > >> > >
>> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
>> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
>> > >> > >
>> > >> > > Norbert
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
That's interesting... we should be able to return a byte array properly
(though this is a bit risky for people who try to later turn this bytearray
into a long using Pig, since the conversion from bytes to longs in Pig is
different than in HBase).
Could you guys open a jira, preferably with an easy way to reproduce the
error?
D
On Tue, Sep 6, 2011 at 10:03 AM, Bryce Poole <br...@tynt.com> wrote:
> My load looks like this
>
> .... AS (key:chararray, value:long);
>
> and I'm able to return data.
>
> I changed the load to
>
> .... AS (key:chararray, value:bytearray);
>
> and had results that match yours.
>
> Try changing the value to long or int type and see if that helps.
>
> -bp
>
>
> On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
>
> > the 'funny' thing is that if I look at the other CF name (from an byte id
> > gives the name, reverse way) :
> >
> > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> > metrics:bytearray);
> >
> > I've got the same issue:
> > (,proc.loadavg.1m)
> > (,proc.loadavg.5m)
> > (,Measurement_1)
> > (,Measurement_2)
> > (,Measurement_3)
> >
> > So there is a real issue with byte array....
> >
> > On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
> >
> > > Hello Bryce,
> > >
> > > not better... :-(
> > >
> > > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> > > metrics:bytearray);
> > > grunt> dump tsd_metrics2;
> > >
> > > [...]
> > >
> > > (Measurement_1,)
> > > (Measurement_2,)
> > > (Measurement_3,)
> > > (proc.loadavg.1m,)
> > > (proc.loadavg.5m,)
> > >
> > >
> > > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> > >
> > >> Try adding -caster=HBaseBinaryConverter along with loadKey
> > >>
> > >> '-caster=HBaseBinaryConverter -loadKey=true'
> > >>
> > >> -bp
> > >>
> > >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
> > >>
> > >> > Hello Norbert,
> > >> >
> > >> > Unfortunately, same result :
> > >> > (Measurement_1,)
> > >> > (Measurement_2,)
> > >> > (Measurement_3,)
> > >> > (proc.loadavg.1m,)
> > >> > (proc.loadavg.5m,)
> > >> >
> > >> > the row key is well extracted (Measurement_1 for example) but the
> > value,
> > >> > the
> > >> > id I need for timestamp data querying, the bytearray, is not :(
> > >> >
> > >> > shazz
> > >> >
> > >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> > >> norbert.burger@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
> > wrote:
> > >> > > > So from Pig when I want to retrieve only the metrics and their
> > value
> > >> (=
> > >> > > id
> > >> > > > for the data table) I do :
> > >> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > >> > '-loadKey
> > >> > > > true') AS (metrics:bytearray);
> > >> > > > dump tsd_metrics;
> > >> > >
> > >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then
> your
> > >> > > LOAD schema includes an extra column containing the row key, and
> you
> > >> > > should add equivalent to your schema column mapping (the AS
> clause).
> > >> > > Try the following:
> > >> > >
> > >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> > >> > >
> > >> > > Norbert
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by Bryce Poole <br...@tynt.com>.
My load looks like this
.... AS (key:chararray, value:long);
and I'm able to return data.
I changed the load to
.... AS (key:chararray, value:bytearray);
and had results that match yours.
Try changing the value to long or int type and see if that helps.
-bp
On Tue, Sep 6, 2011 at 9:00 AM, shazz Ng <sh...@gmail.com> wrote:
> the 'funny' thing is that if I look at the other CF name (from an byte id
> gives the name, reverse way) :
>
> grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
> '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> metrics:bytearray);
>
> I've got the same issue:
> (,proc.loadavg.1m)
> (,proc.loadavg.5m)
> (,Measurement_1)
> (,Measurement_2)
> (,Measurement_3)
>
> So there is a real issue with byte array....
>
> On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
>
> > Hello Bryce,
> >
> > not better... :-(
> >
> > grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> > metrics:bytearray);
> > grunt> dump tsd_metrics2;
> >
> > [...]
> >
> > (Measurement_1,)
> > (Measurement_2,)
> > (Measurement_3,)
> > (proc.loadavg.1m,)
> > (proc.loadavg.5m,)
> >
> >
> > On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> >
> >> Try adding -caster=HBaseBinaryConverter along with loadKey
> >>
> >> '-caster=HBaseBinaryConverter -loadKey=true'
> >>
> >> -bp
> >>
> >> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
> >>
> >> > Hello Norbert,
> >> >
> >> > Unfortunately, same result :
> >> > (Measurement_1,)
> >> > (Measurement_2,)
> >> > (Measurement_3,)
> >> > (proc.loadavg.1m,)
> >> > (proc.loadavg.5m,)
> >> >
> >> > the row key is well extracted (Measurement_1 for example) but the
> value,
> >> > the
> >> > id I need for timestamp data querying, the bytearray, is not :(
> >> >
> >> > shazz
> >> >
> >> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
> >> norbert.burger@gmail.com
> >> > >wrote:
> >> >
> >> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com>
> wrote:
> >> > > > So from Pig when I want to retrieve only the metrics and their
> value
> >> (=
> >> > > id
> >> > > > for the data table) I do :
> >> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > '-loadKey
> >> > > > true') AS (metrics:bytearray);
> >> > > > dump tsd_metrics;
> >> > >
> >> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> >> > > LOAD schema includes an extra column containing the row key, and you
> >> > > should add equivalent to your schema column mapping (the AS clause).
> >> > > Try the following:
> >> > >
> >> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> >> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> >> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >> > >
> >> > > Norbert
> >> > >
> >> >
> >>
> >
> >
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by shazz Ng <sh...@gmail.com>.
the 'funny' thing is that if I look at the other CF name (from an byte id
gives the name, reverse way) :
grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('name:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:bytearray);
I've got the same issue:
(,proc.loadavg.1m)
(,proc.loadavg.5m)
(,Measurement_1)
(,Measurement_2)
(,Measurement_3)
So there is a real issue with byte array....
On Tue, Sep 6, 2011 at 4:30 PM, shazz Ng <sh...@gmail.com> wrote:
> Hello Bryce,
>
> not better... :-(
>
> grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> '-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
> metrics:bytearray);
> grunt> dump tsd_metrics2;
>
> [...]
>
> (Measurement_1,)
> (Measurement_2,)
> (Measurement_3,)
> (proc.loadavg.1m,)
> (proc.loadavg.5m,)
>
>
> On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
>
>> Try adding -caster=HBaseBinaryConverter along with loadKey
>>
>> '-caster=HBaseBinaryConverter -loadKey=true'
>>
>> -bp
>>
>> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
>>
>> > Hello Norbert,
>> >
>> > Unfortunately, same result :
>> > (Measurement_1,)
>> > (Measurement_2,)
>> > (Measurement_3,)
>> > (proc.loadavg.1m,)
>> > (proc.loadavg.5m,)
>> >
>> > the row key is well extracted (Measurement_1 for example) but the value,
>> > the
>> > id I need for timestamp data querying, the bytearray, is not :(
>> >
>> > shazz
>> >
>> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <
>> norbert.burger@gmail.com
>> > >wrote:
>> >
>> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
>> > > > So from Pig when I want to retrieve only the metrics and their value
>> (=
>> > > id
>> > > > for the data table) I do :
>> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
>> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > '-loadKey
>> > > > true') AS (metrics:bytearray);
>> > > > dump tsd_metrics;
>> > >
>> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
>> > > LOAD schema includes an extra column containing the row key, and you
>> > > should add equivalent to your schema column mapping (the AS clause).
>> > > Try the following:
>> > >
>> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
>> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
>> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
>> > >
>> > > Norbert
>> > >
>> >
>>
>
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by shazz Ng <sh...@gmail.com>.
Hello Bryce,
not better... :-(
grunt> tsd_metrics2 = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-caster=HBaseBinaryConverter -loadKey=true') AS (key:bytearray,
metrics:bytearray);
grunt> dump tsd_metrics2;
[...]
(Measurement_1,)
(Measurement_2,)
(Measurement_3,)
(proc.loadavg.1m,)
(proc.loadavg.5m,)
On Tue, Sep 6, 2011 at 4:18 PM, Bryce Poole <br...@tynt.com> wrote:
> Try adding -caster=HBaseBinaryConverter along with loadKey
>
> '-caster=HBaseBinaryConverter -loadKey=true'
>
> -bp
>
> On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
>
> > Hello Norbert,
> >
> > Unfortunately, same result :
> > (Measurement_1,)
> > (Measurement_2,)
> > (Measurement_3,)
> > (proc.loadavg.1m,)
> > (proc.loadavg.5m,)
> >
> > the row key is well extracted (Measurement_1 for example) but the value,
> > the
> > id I need for timestamp data querying, the bytearray, is not :(
> >
> > shazz
> >
> > On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <norbert.burger@gmail.com
> > >wrote:
> >
> > > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> > > > So from Pig when I want to retrieve only the metrics and their value
> (=
> > > id
> > > > for the data table) I do :
> > > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > '-loadKey
> > > > true') AS (metrics:bytearray);
> > > > dump tsd_metrics;
> > >
> > > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> > > LOAD schema includes an extra column containing the row key, and you
> > > should add equivalent to your schema column mapping (the AS clause).
> > > Try the following:
> > >
> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> > >
> > > Norbert
> > >
> >
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by Bryce Poole <br...@tynt.com>.
Try adding -caster=HBaseBinaryConverter along with loadKey
'-caster=HBaseBinaryConverter -loadKey=true'
-bp
On Tue, Sep 6, 2011 at 7:59 AM, shazz Ng <sh...@gmail.com> wrote:
> Hello Norbert,
>
> Unfortunately, same result :
> (Measurement_1,)
> (Measurement_2,)
> (Measurement_3,)
> (proc.loadavg.1m,)
> (proc.loadavg.5m,)
>
> the row key is well extracted (Measurement_1 for example) but the value,
> the
> id I need for timestamp data querying, the bytearray, is not :(
>
> shazz
>
> On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <norbert.burger@gmail.com
> >wrote:
>
> > On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> > > So from Pig when I want to retrieve only the metrics and their value (=
> > id
> > > for the data table) I do :
> > > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> '-loadKey
> > > true') AS (metrics:bytearray);
> > > dump tsd_metrics;
> >
> > Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> > LOAD schema includes an extra column containing the row key, and you
> > should add equivalent to your schema column mapping (the AS clause).
> > Try the following:
> >
> > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> > '-loadKey true') AS (key:bytearray, metrics:bytearray);
> >
> > Norbert
> >
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by shazz Ng <sh...@gmail.com>.
Hello Norbert,
Unfortunately, same result :
(Measurement_1,)
(Measurement_2,)
(Measurement_3,)
(proc.loadavg.1m,)
(proc.loadavg.5m,)
the row key is well extracted (Measurement_1 for example) but the value, the
id I need for timestamp data querying, the bytearray, is not :(
shazz
On Tue, Sep 6, 2011 at 3:37 PM, Norbert Burger <no...@gmail.com>wrote:
> On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> > So from Pig when I want to retrieve only the metrics and their value (=
> id
> > for the data table) I do :
> > tsd_metrics = LOAD 'hbase://tsdb-uid' using
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', '-loadKey
> > true') AS (metrics:bytearray);
> > dump tsd_metrics;
>
> Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
> LOAD schema includes an extra column containing the row key, and you
> should add equivalent to your schema column mapping (the AS clause).
> Try the following:
>
> tsd_metrics = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
> '-loadKey true') AS (key:bytearray, metrics:bytearray);
>
> Norbert
>
Re: HbaseStorage / OpenTSDB queries issue
Posted by Norbert Burger <no...@gmail.com>.
On Tue, Sep 6, 2011 at 7:58 AM, shazz Ng <sh...@gmail.com> wrote:
> So from Pig when I want to retrieve only the metrics and their value (= id
> for the data table) I do :
> tsd_metrics = LOAD 'hbase://tsdb-uid' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics', '-loadKey
> true') AS (metrics:bytearray);
> dump tsd_metrics;
Shazz -- if you use the "-loadKey" option to HbaseStorage, then your
LOAD schema includes an extra column containing the row key, and you
should add equivalent to your schema column mapping (the AS clause).
Try the following:
tsd_metrics = LOAD 'hbase://tsdb-uid' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('id:metrics',
'-loadKey true') AS (key:bytearray, metrics:bytearray);
Norbert