You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2010/06/03 07:52:02 UTC

Re: Using HBase for logging

Hi Viktors,

I noticed you mentioned the following two things:

> - several column families on one date/time are useful
> - 
and different tables for different level of aggregation (hour, date, week, month, year)

Could you please explain:
- why multiple CFs on one date/time are good (better than 1)?
- why store different levels of aggregation to separate tables instead of just 1 table?


 Thanks
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Viktors Rotanovs <vi...@gmail.com>
> To: user@hbase.apache.org
> Sent: Mon, May 24, 2010 7:32:26 PM
> Subject: Re: Using HBase for logging
> 
> I'm using HBase for similar stats, some things I've learned:
- date/time as 
> key is good because that way it's very easy to get
last N results (for a 
> chart, for example), and it's much more scalable
than timestamps
- 
> several column families on one date/time are useful
   - and different 
> tables for different level of aggregation (hour,
date, week, month, year)

> - you can increment long values when you need to know total:

> href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue" 
> target=_blank 
> >http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],
byte[], 
> byte[], long)
- MR jobs are a good and scalable way of processing this type 
> of data
- data size is unlimited, so it's fine to write to multiple 
> tables
- optimize for reads you're going to make, not for writes.
To 
> import some of our logs, I'm using a java program which is called
via 
> logrotate every 10 minutes (but be careful with that one, because
if hbase 
> client freezes like happened to me after 0.20.4 upgrade,
memory can get 
> filled very quickly).

There's also a Python project for analytical data: 
> 
> >http://github.com/zohmg/zohmg

Hope that helps,
-- 
> Viktors

On Tue, May 25, 2010 at 12:44 AM, Alex Thurlow <
> ymailto="mailto:alex@blastro.com" 
> href="mailto:alex@blastro.com">alex@blastro.com> wrote:
> Hi 
> list,
>    With HBase's great write speed, I was thinking it would be a 
> good thing
> to switch an app that logs to a database to logging to HBase. 
>  I couldn't
> really find anyone else who's using it that way though.  Are 
> there reasons I
> shouldn't?  If I should, how should I structure my 
> data?
>
> It's basically going to be data for an ad server, so the 
> relevant stuff
> would be the timestamp, the id of the ad placement, and 
> the id of the
> creative that showed.  Some other data would be stored, 
> but I wouldn't need
> to search on it.
>
> I would be wanting 
> to make reports out of that data by date, date/placement
> id, 
> date/creative id, date/placementid/creativeid
>
> Should I just log 
> with the timestamp as the key and then pull the whole
> range and filter 
> when I need the data or should I log everything three times
> so I can 
> pull by whichever key I need?
>
> I'm fairly new to HBase, although 
> I've used Cassandra some, so I have an
> idea of how this kind of works. 
>  I just can't quite get my head around the
> right way to use it for this 
> purpose.
>
> Thanks,
>   
>  -Alex
>
>



-- 

> target=_blank >http://rotanovs.com - personal blog | 
> href="http://www.hitgeist.com" target=_blank >http://www.hitgeist.com 
> -
fastest growing websites