You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rita <rm...@gmail.com> on 2012/09/20 03:09:07 UTC

time series question

Yet another time series questions.

I have an issue where my row key will be the same but I will have multiple
versions of the data. I dont need only the last one instead I need all X
number of versions. Here I have 3 different versions.

sensor,time,v1,v2,v3
c04,0930001,0,0,0
c04,0930001,0,4,0
c04,0930001,0,4,3

key=sensor+time
cf=d
cf:v1
cf:v2
cf:v3

I plan to query, like this. At 093001 was was the v1  of c04? same for v2
and v3?

I dont want to use the native Hbase version feature because some sensors
have more than 100 different, ¨ticks¨ and I would like to keep track of all
of them.  Perhaps, I should change my key to include sensor+time+v value?
Any thoughts or clever schemas I can use?






-- 
--- Get your facts first, then you can distort them as you please.--

Re: time series question

Posted by Rita <rm...@gmail.com>.
Yes, was planning to store the qualifier as a serialized value.



On Thu, Sep 20, 2012 at 9:03 AM, Bijieshan <bi...@huawei.com> wrote:

> Yes. I think it should be done on app level.
> "XXXXX" is a random number with a specified length of bytes :)
>
> > CF:d:+.004_t0  (for time.004 first occurance)
> I didn't get you very well, you store the serialized value as qualifier?
>
> Jieshan
> -----Original Message-----
> From: Rita [mailto:rmorgan466@gmail.com]
> Sent: Thursday, September 20, 2012 8:48 PM
> To: user@hbase.apache.org
> Cc: Zhouxunmiao
> Subject: Re: time series question
>
> what is ¨XXXXX", is this a delimiter?
>
> One thing I can do similar to tsdb is,
>
> Key Sensor + time (only in milliseconds)
>   CF:d
>   CF:d:+.004_t0  (for time.004 first occurance)
>   Serialized format
>   CF:d+.004_t1 (for time.004 second occurance)
>   Serialized format
>
> What do you think of that? Is there an automatic way of incrementing t ? or
> should that be done on the app level?
>
>
>
>
>
> On Thu, Sep 20, 2012 at 8:27 AM, Bijieshan <bi...@huawei.com> wrote:
>
> > 2 suggestions for discussion:
> >
> > [Key-Schema 1]:  Sensor + time(milliseconds) + XXXX + V1V2V3
> >
> >    XXXX :- M bytes random number.
> >    V1V2V3 :- Limit the length for each metrics. Likes v1=10, v2=8, v3=9.
> > And the length we set to 3. The value should be : 010008009.
> >
> > [Key-Schema 2]:  Sensor + time(milliseconds) + XXXX
> >    XXXX :- M bytes random number.
> >   Just store the metrics in value part.
> >
> > Jieshan
> >
> > -----Original Message-----
> > From: Rita [mailto:rmorgan466@gmail.com]
> > Sent: Thursday, September 20, 2012 8:04 PM
> > To: user@hbase.apache.org
> > Cc: Zhouxunmiao
> > Subject: Re: time series question
> >
> > Jieshan,
> >
> > v is not a version number. v1, v2, v3 are actually metrics.  My data is
> >  high frequency therefore in one second I can have several hundred
> entries
> > in it.
> >
> > Would using nested entity work in such as case?
> >
> > key=sensor+time
> >   cf=d
> >     cf:d,t1 (v1=0,v2=4,v3=0)
> >     cf:d,t2 (v1=0,v2=4,v3=0)
> >     cf:d,t2 (v1=0,v2=4,v3=2)
> >
> > But now sure efficient this would be.
> >
> >
> > Also, for creating an entity like this. Would I need to do my own
> > serializing? would JSON work?
> >
> >
> >
> > On Wed, Sep 19, 2012 at 10:37 PM, Bijieshan <bi...@huawei.com>
> wrote:
> >
> > > I prefer the idea similar to " sensor+time+v "...The problem is the
> part
> > > of "v". Is it the version number? Or some random number to distinguish
> > the
> > > different version?
> > >
> > > Jieshan
> > > -----Original Message-----
> > > From: Rita [mailto:rmorgan466@gmail.com]
> > > Sent: Thursday, September 20, 2012 9:09 AM
> > > To: user@hbase.apache.org
> > > Subject: time series question
> > >
> > > Yet another time series questions.
> > >
> > > I have an issue where my row key will be the same but I will have
> > multiple
> > > versions of the data. I dont need only the last one instead I need all
> X
> > > number of versions. Here I have 3 different versions.
> > >
> > > sensor,time,v1,v2,v3
> > > c04,0930001,0,0,0
> > > c04,0930001,0,4,0
> > > c04,0930001,0,4,3
> > >
> > > key=sensor+time
> > > cf=d
> > > cf:v1
> > > cf:v2
> > > cf:v3
> > >
> > > I plan to query, like this. At 093001 was was the v1  of c04? same for
> v2
> > > and v3?
> > >
> > > I dont want to use the native Hbase version feature because some
> sensors
> > > have more than 100 different, ¨ticks¨ and I would like to keep track of
> > all
> > > of them.  Perhaps, I should change my key to include sensor+time+v
> value?
> > > Any thoughts or clever schemas I can use?
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > --- Get your facts first, then you can distort them as you please.--
> > >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
--- Get your facts first, then you can distort them as you please.--

RE: time series question

Posted by Bijieshan <bi...@huawei.com>.
Yes. I think it should be done on app level.
"XXXXX" is a random number with a specified length of bytes :)

> CF:d:+.004_t0  (for time.004 first occurance)
I didn't get you very well, you store the serialized value as qualifier? 

Jieshan
-----Original Message-----
From: Rita [mailto:rmorgan466@gmail.com] 
Sent: Thursday, September 20, 2012 8:48 PM
To: user@hbase.apache.org
Cc: Zhouxunmiao
Subject: Re: time series question

what is ¨XXXXX", is this a delimiter?

One thing I can do similar to tsdb is,

Key Sensor + time (only in milliseconds)
  CF:d
  CF:d:+.004_t0  (for time.004 first occurance)
  Serialized format
  CF:d+.004_t1 (for time.004 second occurance)
  Serialized format

What do you think of that? Is there an automatic way of incrementing t ? or
should that be done on the app level?





On Thu, Sep 20, 2012 at 8:27 AM, Bijieshan <bi...@huawei.com> wrote:

> 2 suggestions for discussion:
>
> [Key-Schema 1]:  Sensor + time(milliseconds) + XXXX + V1V2V3
>
>    XXXX :- M bytes random number.
>    V1V2V3 :- Limit the length for each metrics. Likes v1=10, v2=8, v3=9.
> And the length we set to 3. The value should be : 010008009.
>
> [Key-Schema 2]:  Sensor + time(milliseconds) + XXXX
>    XXXX :- M bytes random number.
>   Just store the metrics in value part.
>
> Jieshan
>
> -----Original Message-----
> From: Rita [mailto:rmorgan466@gmail.com]
> Sent: Thursday, September 20, 2012 8:04 PM
> To: user@hbase.apache.org
> Cc: Zhouxunmiao
> Subject: Re: time series question
>
> Jieshan,
>
> v is not a version number. v1, v2, v3 are actually metrics.  My data is
>  high frequency therefore in one second I can have several hundred entries
> in it.
>
> Would using nested entity work in such as case?
>
> key=sensor+time
>   cf=d
>     cf:d,t1 (v1=0,v2=4,v3=0)
>     cf:d,t2 (v1=0,v2=4,v3=0)
>     cf:d,t2 (v1=0,v2=4,v3=2)
>
> But now sure efficient this would be.
>
>
> Also, for creating an entity like this. Would I need to do my own
> serializing? would JSON work?
>
>
>
> On Wed, Sep 19, 2012 at 10:37 PM, Bijieshan <bi...@huawei.com> wrote:
>
> > I prefer the idea similar to " sensor+time+v "...The problem is the part
> > of "v". Is it the version number? Or some random number to distinguish
> the
> > different version?
> >
> > Jieshan
> > -----Original Message-----
> > From: Rita [mailto:rmorgan466@gmail.com]
> > Sent: Thursday, September 20, 2012 9:09 AM
> > To: user@hbase.apache.org
> > Subject: time series question
> >
> > Yet another time series questions.
> >
> > I have an issue where my row key will be the same but I will have
> multiple
> > versions of the data. I dont need only the last one instead I need all X
> > number of versions. Here I have 3 different versions.
> >
> > sensor,time,v1,v2,v3
> > c04,0930001,0,0,0
> > c04,0930001,0,4,0
> > c04,0930001,0,4,3
> >
> > key=sensor+time
> > cf=d
> > cf:v1
> > cf:v2
> > cf:v3
> >
> > I plan to query, like this. At 093001 was was the v1  of c04? same for v2
> > and v3?
> >
> > I dont want to use the native Hbase version feature because some sensors
> > have more than 100 different, ¨ticks¨ and I would like to keep track of
> all
> > of them.  Perhaps, I should change my key to include sensor+time+v value?
> > Any thoughts or clever schemas I can use?
> >
> >
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: time series question

Posted by Rita <rm...@gmail.com>.
what is ¨XXXXX", is this a delimiter?

One thing I can do similar to tsdb is,

Key Sensor + time (only in milliseconds)
  CF:d
  CF:d:+.004_t0  (for time.004 first occurance)
  Serialized format
  CF:d+.004_t1 (for time.004 second occurance)
  Serialized format

What do you think of that? Is there an automatic way of incrementing t ? or
should that be done on the app level?





On Thu, Sep 20, 2012 at 8:27 AM, Bijieshan <bi...@huawei.com> wrote:

> 2 suggestions for discussion:
>
> [Key-Schema 1]:  Sensor + time(milliseconds) + XXXX + V1V2V3
>
>    XXXX :- M bytes random number.
>    V1V2V3 :- Limit the length for each metrics. Likes v1=10, v2=8, v3=9.
> And the length we set to 3. The value should be : 010008009.
>
> [Key-Schema 2]:  Sensor + time(milliseconds) + XXXX
>    XXXX :- M bytes random number.
>   Just store the metrics in value part.
>
> Jieshan
>
> -----Original Message-----
> From: Rita [mailto:rmorgan466@gmail.com]
> Sent: Thursday, September 20, 2012 8:04 PM
> To: user@hbase.apache.org
> Cc: Zhouxunmiao
> Subject: Re: time series question
>
> Jieshan,
>
> v is not a version number. v1, v2, v3 are actually metrics.  My data is
>  high frequency therefore in one second I can have several hundred entries
> in it.
>
> Would using nested entity work in such as case?
>
> key=sensor+time
>   cf=d
>     cf:d,t1 (v1=0,v2=4,v3=0)
>     cf:d,t2 (v1=0,v2=4,v3=0)
>     cf:d,t2 (v1=0,v2=4,v3=2)
>
> But now sure efficient this would be.
>
>
> Also, for creating an entity like this. Would I need to do my own
> serializing? would JSON work?
>
>
>
> On Wed, Sep 19, 2012 at 10:37 PM, Bijieshan <bi...@huawei.com> wrote:
>
> > I prefer the idea similar to " sensor+time+v "...The problem is the part
> > of "v". Is it the version number? Or some random number to distinguish
> the
> > different version?
> >
> > Jieshan
> > -----Original Message-----
> > From: Rita [mailto:rmorgan466@gmail.com]
> > Sent: Thursday, September 20, 2012 9:09 AM
> > To: user@hbase.apache.org
> > Subject: time series question
> >
> > Yet another time series questions.
> >
> > I have an issue where my row key will be the same but I will have
> multiple
> > versions of the data. I dont need only the last one instead I need all X
> > number of versions. Here I have 3 different versions.
> >
> > sensor,time,v1,v2,v3
> > c04,0930001,0,0,0
> > c04,0930001,0,4,0
> > c04,0930001,0,4,3
> >
> > key=sensor+time
> > cf=d
> > cf:v1
> > cf:v2
> > cf:v3
> >
> > I plan to query, like this. At 093001 was was the v1  of c04? same for v2
> > and v3?
> >
> > I dont want to use the native Hbase version feature because some sensors
> > have more than 100 different, ¨ticks¨ and I would like to keep track of
> all
> > of them.  Perhaps, I should change my key to include sensor+time+v value?
> > Any thoughts or clever schemas I can use?
> >
> >
> >
> >
> >
> >
> > --
> > --- Get your facts first, then you can distort them as you please.--
> >
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
--- Get your facts first, then you can distort them as you please.--

RE: time series question

Posted by Bijieshan <bi...@huawei.com>.
2 suggestions for discussion:

[Key-Schema 1]:  Sensor + time(milliseconds) + XXXX + V1V2V3

   XXXX :- M bytes random number.
   V1V2V3 :- Limit the length for each metrics. Likes v1=10, v2=8, v3=9. And the length we set to 3. The value should be : 010008009. 

[Key-Schema 2]:  Sensor + time(milliseconds) + XXXX
   XXXX :- M bytes random number.
  Just store the metrics in value part.

Jieshan

-----Original Message-----
From: Rita [mailto:rmorgan466@gmail.com] 
Sent: Thursday, September 20, 2012 8:04 PM
To: user@hbase.apache.org
Cc: Zhouxunmiao
Subject: Re: time series question

Jieshan,

v is not a version number. v1, v2, v3 are actually metrics.  My data is
 high frequency therefore in one second I can have several hundred entries
in it.

Would using nested entity work in such as case?

key=sensor+time
  cf=d
    cf:d,t1 (v1=0,v2=4,v3=0)
    cf:d,t2 (v1=0,v2=4,v3=0)
    cf:d,t2 (v1=0,v2=4,v3=2)

But now sure efficient this would be.


Also, for creating an entity like this. Would I need to do my own
serializing? would JSON work?



On Wed, Sep 19, 2012 at 10:37 PM, Bijieshan <bi...@huawei.com> wrote:

> I prefer the idea similar to " sensor+time+v "...The problem is the part
> of "v". Is it the version number? Or some random number to distinguish the
> different version?
>
> Jieshan
> -----Original Message-----
> From: Rita [mailto:rmorgan466@gmail.com]
> Sent: Thursday, September 20, 2012 9:09 AM
> To: user@hbase.apache.org
> Subject: time series question
>
> Yet another time series questions.
>
> I have an issue where my row key will be the same but I will have multiple
> versions of the data. I dont need only the last one instead I need all X
> number of versions. Here I have 3 different versions.
>
> sensor,time,v1,v2,v3
> c04,0930001,0,0,0
> c04,0930001,0,4,0
> c04,0930001,0,4,3
>
> key=sensor+time
> cf=d
> cf:v1
> cf:v2
> cf:v3
>
> I plan to query, like this. At 093001 was was the v1  of c04? same for v2
> and v3?
>
> I dont want to use the native Hbase version feature because some sensors
> have more than 100 different, ¨ticks¨ and I would like to keep track of all
> of them.  Perhaps, I should change my key to include sensor+time+v value?
> Any thoughts or clever schemas I can use?
>
>
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: time series question

Posted by Rita <rm...@gmail.com>.
Jieshan,

v is not a version number. v1, v2, v3 are actually metrics.  My data is
 high frequency therefore in one second I can have several hundred entries
in it.

Would using nested entity work in such as case?

key=sensor+time
  cf=d
    cf:d,t1 (v1=0,v2=4,v3=0)
    cf:d,t2 (v1=0,v2=4,v3=0)
    cf:d,t2 (v1=0,v2=4,v3=2)

But now sure efficient this would be.


Also, for creating an entity like this. Would I need to do my own
serializing? would JSON work?



On Wed, Sep 19, 2012 at 10:37 PM, Bijieshan <bi...@huawei.com> wrote:

> I prefer the idea similar to " sensor+time+v "...The problem is the part
> of "v". Is it the version number? Or some random number to distinguish the
> different version?
>
> Jieshan
> -----Original Message-----
> From: Rita [mailto:rmorgan466@gmail.com]
> Sent: Thursday, September 20, 2012 9:09 AM
> To: user@hbase.apache.org
> Subject: time series question
>
> Yet another time series questions.
>
> I have an issue where my row key will be the same but I will have multiple
> versions of the data. I dont need only the last one instead I need all X
> number of versions. Here I have 3 different versions.
>
> sensor,time,v1,v2,v3
> c04,0930001,0,0,0
> c04,0930001,0,4,0
> c04,0930001,0,4,3
>
> key=sensor+time
> cf=d
> cf:v1
> cf:v2
> cf:v3
>
> I plan to query, like this. At 093001 was was the v1  of c04? same for v2
> and v3?
>
> I dont want to use the native Hbase version feature because some sensors
> have more than 100 different, ¨ticks¨ and I would like to keep track of all
> of them.  Perhaps, I should change my key to include sensor+time+v value?
> Any thoughts or clever schemas I can use?
>
>
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>



-- 
--- Get your facts first, then you can distort them as you please.--

RE: time series question

Posted by Bijieshan <bi...@huawei.com>.
I prefer the idea similar to " sensor+time+v "...The problem is the part of "v". Is it the version number? Or some random number to distinguish the different version?

Jieshan
-----Original Message-----
From: Rita [mailto:rmorgan466@gmail.com] 
Sent: Thursday, September 20, 2012 9:09 AM
To: user@hbase.apache.org
Subject: time series question

Yet another time series questions.

I have an issue where my row key will be the same but I will have multiple
versions of the data. I dont need only the last one instead I need all X
number of versions. Here I have 3 different versions.

sensor,time,v1,v2,v3
c04,0930001,0,0,0
c04,0930001,0,4,0
c04,0930001,0,4,3

key=sensor+time
cf=d
cf:v1
cf:v2
cf:v3

I plan to query, like this. At 093001 was was the v1  of c04? same for v2
and v3?

I dont want to use the native Hbase version feature because some sensors
have more than 100 different, ¨ticks¨ and I would like to keep track of all
of them.  Perhaps, I should change my key to include sensor+time+v value?
Any thoughts or clever schemas I can use?






-- 
--- Get your facts first, then you can distort them as you please.--

Re: time series question

Posted by Tom Brown <to...@gmail.com>.
I have a similar situation. I have certain keys such that if I didn't have
the timestamps as part of the key I would have to have hundreds and even
thousands of duplicates.

However, I would recommend making sure a the timestamps portion is fixed
width (it will guarantee that your keys for a particular sensor remain in
order lexigraphically as well as temporally)

Regards,

--Tom

On Wednesday, September 19, 2012, Rita <rm...@gmail.com> wrote:
> Yet another time series questions.
>
> I have an issue where my row key will be the same but I will have multiple
> versions of the data. I dont need only the last one instead I need all X
> number of versions. Here I have 3 different versions.
>
> sensor,time,v1,v2,v3
> c04,0930001,0,0,0
> c04,0930001,0,4,0
> c04,0930001,0,4,3
>
> key=sensor+time
> cf=d
> cf:v1
> cf:v2
> cf:v3
>
> I plan to query, like this. At 093001 was was the v1  of c04? same for v2
> and v3?
>
> I dont want to use the native Hbase version feature because some sensors
> have more than 100 different, ¨ticks¨ and I would like to keep track of
all
> of them.  Perhaps, I should change my key to include sensor+time+v value?
> Any thoughts or clever schemas I can use?
>
>
>
>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>