You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mohit Anchlia <mo...@gmail.com> on 2012/08/29 02:21:13 UTC

Timeseries data

In timeseries type data how do people deal with scenarios where one might
get multiple events in a millisecond? Using nano second approach seems
tricky. Other option is to take advantage of versions or counters.

Re: Timeseries data

Posted by Amandeep Khurana <am...@gmail.com>.
Can you give an example of what you are trying to do and how you would
use both the writes coming in at the same instant for the same cell
and why do you say that the nanosecond approach is tricky?

On Aug 28, 2012, at 5:54 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> How does it deal with multiple writes in the same milliseconds for the same
> rowkey/column? I can't see that info.
>
> On Tue, Aug 28, 2012 at 5:33 PM, Marcos Ortiz <ml...@uci.cu> wrote:
>
>> Study the OpenTSDB at StumbleUpon described  by Benoit "tsuna" Sigoure (
>> tsuna@stumbleupon.com) in the
>> HBaseCon talk called "Lessons Learned from OpenTSDB".
>> His team have done a great job working with Time-series data, and he gave
>> a lot of great advices to work with this kind of data with HBase:
>> - Wider rows to seek faster
>> - Use asynchbase + Netty or Finagle(great tool created by Twitter
>> engineers to work with HBase) = performance ++
>> - Make writes idempotent and independent
>>   before: start rows at arbitrary points in time
>>   after: align rows on 10m (then 1h) boundaries
>> - Store more data per Key/Value
>> - Compact your data
>> - Use short family names
>> Best wishes
>> El 28/08/2012 20:21, Mohit Anchlia escribió:
>>
>>> In timeseries type data how do people deal with scenarios where one might
>>> get multiple events in a millisecond? Using nano second approach seems
>>> tricky. Other option is to take advantage of versions or counters.
>>>
>>>
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>
>>> http://www.uci.cu
>>> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
>>> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>>>
>>
>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
>> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>>

Re: Timeseries data

Posted by Christian Schäfer <sy...@yahoo.de>.
Like Mohit suggests I also would create rows where all events for a certain milliseconds or second are contained (as nested entities)..

Due to this time based grouping/aggregation/batching (aka timeboxing), each row is like an event bag for all events that occured in a certain millisecond.

Btw: grouping the puts on a millisecond or second basis (or better bit more) would decrease pressure on hbase because of fewer RPC-requests.

kind regards,
Chris


----- Ursprüngliche Message -----
Von: Mohit Anchlia <mo...@gmail.com>
An: user@hbase.apache.org
CC: 
Gesendet: 2:54 Mittwoch, 29.August 2012
Betreff: Re: Timeseries data

How does it deal with multiple writes in the same milliseconds for the same
rowkey/column? I can't see that info.

On Tue, Aug 28, 2012 at 5:33 PM, Marcos Ortiz <ml...@uci.cu> wrote:

> Study the OpenTSDB at StumbleUpon described  by Benoit "tsuna" Sigoure (
> tsuna@stumbleupon.com) in the
> HBaseCon talk called "Lessons Learned from OpenTSDB".
> His team have done a great job working with Time-series data, and he gave
> a lot of great advices to work with this kind of data with HBase:
> - Wider rows to seek faster
> - Use asynchbase + Netty or Finagle(great tool created by Twitter
> engineers to work with HBase) = performance ++
> - Make writes idempotent and independent
>    before: start rows at arbitrary points in time
>    after: align rows on 10m (then 1h) boundaries
> - Store more data per Key/Value
> - Compact your data
> - Use short family names
> Best wishes
> El 28/08/2012 20:21, Mohit Anchlia escribió:
>
>> In timeseries type data how do people deal with scenarios where one might
>> get multiple events in a millisecond? Using nano second approach seems
>> tricky. Other option is to take advantage of versions or counters.
>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
>> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>>
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>


Re: Timeseries data

Posted by Mohit Anchlia <mo...@gmail.com>.
How does it deal with multiple writes in the same milliseconds for the same
rowkey/column? I can't see that info.

On Tue, Aug 28, 2012 at 5:33 PM, Marcos Ortiz <ml...@uci.cu> wrote:

> Study the OpenTSDB at StumbleUpon described  by Benoit "tsuna" Sigoure (
> tsuna@stumbleupon.com) in the
> HBaseCon talk called "Lessons Learned from OpenTSDB".
> His team have done a great job working with Time-series data, and he gave
> a lot of great advices to work with this kind of data with HBase:
> - Wider rows to seek faster
> - Use asynchbase + Netty or Finagle(great tool created by Twitter
> engineers to work with HBase) = performance ++
> - Make writes idempotent and independent
>    before: start rows at arbitrary points in time
>    after: align rows on 10m (then 1h) boundaries
> - Store more data per Key/Value
> - Compact your data
> - Use short family names
> Best wishes
> El 28/08/2012 20:21, Mohit Anchlia escribió:
>
>> In timeseries type data how do people deal with scenarios where one might
>> get multiple events in a millisecond? Using nano second approach seems
>> tricky. Other option is to take advantage of versions or counters.
>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
>> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>>
>
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/**universidad.uci<http://www.facebook.com/universidad.uci>
> http://www.flickr.com/photos/**universidad_uci<http://www.flickr.com/photos/universidad_uci>
>

Re: Timeseries data

Posted by Marcos Ortiz <ml...@uci.cu>.
Study the OpenTSDB at StumbleUpon described  by Benoit "tsuna" Sigoure 
(tsuna@stumbleupon.com) in the
HBaseCon talk called "Lessons Learned from OpenTSDB".
His team have done a great job working with Time-series data, and he 
gave a lot of great advices to work with this kind of data with HBase:
- Wider rows to seek faster
- Use asynchbase + Netty or Finagle(great tool created by Twitter 
engineers to work with HBase) = performance ++
- Make writes idempotent and independent
    before: start rows at arbitrary points in time
    after: align rows on 10m (then 1h) boundaries
- Store more data per Key/Value
- Compact your data
- Use short family names
Best wishes
El 28/08/2012 20:21, Mohit Anchlia escribió:
> In timeseries type data how do people deal with scenarios where one might
> get multiple events in a millisecond? Using nano second approach seems
> tricky. Other option is to take advantage of versions or counters.
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci