You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2010/06/03 07:52:02 UTC
Re: Using HBase for logging
Hi Viktors,
I noticed you mentioned the following two things:
> - several column families on one date/time are useful
> -
and different tables for different level of aggregation (hour, date, week, month, year)
Could you please explain:
- why multiple CFs on one date/time are good (better than 1)?
- why store different levels of aggregation to separate tables instead of just 1 table?
Thanks
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/
----- Original Message ----
> From: Viktors Rotanovs <vi...@gmail.com>
> To: user@hbase.apache.org
> Sent: Mon, May 24, 2010 7:32:26 PM
> Subject: Re: Using HBase for logging
>
> I'm using HBase for similar stats, some things I've learned:
- date/time as
> key is good because that way it's very easy to get
last N results (for a
> chart, for example), and it's much more scalable
than timestamps
-
> several column families on one date/time are useful
- and different
> tables for different level of aggregation (hour,
date, week, month, year)
> - you can increment long values when you need to know total:
> href="http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue"
> target=_blank
> >http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue(byte[],
byte[],
> byte[], long)
- MR jobs are a good and scalable way of processing this type
> of data
- data size is unlimited, so it's fine to write to multiple
> tables
- optimize for reads you're going to make, not for writes.
To
> import some of our logs, I'm using a java program which is called
via
> logrotate every 10 minutes (but be careful with that one, because
if hbase
> client freezes like happened to me after 0.20.4 upgrade,
memory can get
> filled very quickly).
There's also a Python project for analytical data:
>
> >http://github.com/zohmg/zohmg
Hope that helps,
--
> Viktors
On Tue, May 25, 2010 at 12:44 AM, Alex Thurlow <
> ymailto="mailto:alex@blastro.com"
> href="mailto:alex@blastro.com">alex@blastro.com> wrote:
> Hi
> list,
> With HBase's great write speed, I was thinking it would be a
> good thing
> to switch an app that logs to a database to logging to HBase.
> I couldn't
> really find anyone else who's using it that way though. Are
> there reasons I
> shouldn't? If I should, how should I structure my
> data?
>
> It's basically going to be data for an ad server, so the
> relevant stuff
> would be the timestamp, the id of the ad placement, and
> the id of the
> creative that showed. Some other data would be stored,
> but I wouldn't need
> to search on it.
>
> I would be wanting
> to make reports out of that data by date, date/placement
> id,
> date/creative id, date/placementid/creativeid
>
> Should I just log
> with the timestamp as the key and then pull the whole
> range and filter
> when I need the data or should I log everything three times
> so I can
> pull by whichever key I need?
>
> I'm fairly new to HBase, although
> I've used Cassandra some, so I have an
> idea of how this kind of works.
> I just can't quite get my head around the
> right way to use it for this
> purpose.
>
> Thanks,
>
> -Alex
>
>
--
> target=_blank >http://rotanovs.com - personal blog |
> href="http://www.hitgeist.com" target=_blank >http://www.hitgeist.com
> -
fastest growing websites