You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Miguel Costa <mi...@telecom.pt> on 2011/04/05 11:16:14 UTC

Use Timestamp

Hi,

 

I want to have my data aggregated by day, so I would like to know wich is
the best option to query my data. To put The timestamp of the data on my
rowkey or to use timestamp of columns?

 

Thanks,

 


Miguel

Re: Use Timestamp

Posted by Jean-Daniel Cryans <jd...@apache.org>.

What I usually tell people is that if time is part of your model, then put
it in a key.

J-D

On Tue, Apr 5, 2011 at 2:16 AM, Miguel Costa <mi...@telecom.pt>wrote:

> Hi,
>
>
>
> I want to have my data aggregated by day, so I would like to know wich is
> the best option to query my data. To put The timestamp of the data on my
> rowkey or to use timestamp of columns?
>
>
>
> Thanks,
>
>
>
> Miguel
>
> *  *
>
>
>
>
>
>
>
>
>

Re: Use Timestamp

Posted by Ted Dunning <td...@maprtech.com>.

Have a look at OpenTSDB (again!).  They put a base time in the key and then
have many columns for samples at offsets from that base.

On Tue, Apr 5, 2011 at 10:30 AM, Miguel Costa <mi...@telecom.pt>wrote:

> My focus here is if I gain anything put the timestamp in the columns
> instead than the row , because I will have less rows bua a lot more columns
> with timestamps.
>

RE: Use Timestamp

Posted by Miguel Costa <mi...@telecom.pt>.

Yes I will put something in front of the date.

 

If the date comes in milliseconds in can be millions of  rows., even with a
combined key, but I will only need this data to  maybe hour map reduce jobs.

 

My focus here is if I gain anything put the timestamp in the columns instead
than the row , because I will have less rows bua a lot more columns with
timestamps.


 

 

Thanks,

 

Miguel


 


 

	


  

	

 

 

 

From: Ted Dunning [mailto:tdunning@maprtech.com] 
Sent: terça-feira, 5 de Abril de 2011 17:02
To: user@hbase.apache.org
Cc: Miguel Costa
Subject: Re: Use Timestamp

 

Using timestamp as key will cause your scan to largely hit one region.  That
may not be so good.

 

If you add something in front of the date, you may be able to spread your
scan over several machines.

 

On the other hand, your aggregation might be very small.  In that case, the
convenience of a time key might be enough to sufficient to make you prefer
that implementation.

 

How much data are you talking about aggregating each time you aggregate?

On Tue, Apr 5, 2011 at 2:16 AM, Miguel Costa <mi...@telecom.pt>
wrote:

I want to have my data aggregated by day, so I would like to know wich is
the best option to query my data. To put The timestamp of the data on my
rowkey or to use timestamp of columns?

Re: Use Timestamp

Posted by Ted Dunning <td...@maprtech.com>.

Using timestamp as key will cause your scan to largely hit one region.  That
may not be so good.

If you add something in front of the date, you may be able to spread your
scan over several machines.

On the other hand, your aggregation might be very small.  In that case, the
convenience of a time key might be enough to sufficient to make you prefer
that implementation.

How much data are you talking about aggregating each time you aggregate?

On Tue, Apr 5, 2011 at 2:16 AM, Miguel Costa <mi...@telecom.pt>wrote:

> I want to have my data aggregated by day, so I would like to know wich is
> the best option to query my data. To put The timestamp of the data on my
> rowkey or to use timestamp of columns?
>
>