You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Henning Blohm <he...@zfabrik.de> on 2014/04/09 11:55:32 UTC

Lighter Map/Reduce on HBase

We operate a solution that stores large amounts of data in HBASE that needs
to be available for online access.

For efficient scanning, there are three pieces of data encoded in row keys
(in particular a time dimension) and for other reasons some columns hold
JSON encoded data.

Currently, analytics data is created in two ways:

a) a non-trivial M/R job that computes pre-aggregated data sets and
offloads them into an analytical data base for interactive reporting
b) other M/R jobs that create specialize reports (heuristics) that cannot
be computed from pre-aggregated data

In particular for b) but possibly also for variations of a) I would like to
find more "user friendly" ways than Java implemented M/R jobs - at least
for some cases.

So this is not about interactive querying of data directly from HBase
tables. It is rather about pre-processing HBase stored (large) data sets
into either input to interactive query engines (some other DB, Phoenix,...)
or into some other specialized format.

I spent some time with HIVE but found that the HBase integration simply
doesn't cut it (parsing a row key, mapping JSON column content). I know
there is some more out there, but before spending an eternity trying out
various methods, I am shamelessly trying to benefit from your expertise by
asking for some good pointers.

Thanks,
Henning

Re: Lighter Map/Reduce on HBase

Posted by Koert Kuipers <ko...@tresata.com>.

we do these jobs in cascading/scalding
On Apr 9, 2014 5:56 AM, "Henning Blohm" <he...@zfabrik.de> wrote:

> We operate a solution that stores large amounts of data in HBASE that needs
> to be available for online access.
>
> For efficient scanning, there are three pieces of data encoded in row keys
> (in particular a time dimension) and for other reasons some columns hold
> JSON encoded data.
>
> Currently, analytics data is created in two ways:
>
> a) a non-trivial M/R job that computes pre-aggregated data sets and
> offloads them into an analytical data base for interactive reporting
> b) other M/R jobs that create specialize reports (heuristics) that cannot
> be computed from pre-aggregated data
>
> In particular for b) but possibly also for variations of a) I would like to
> find more "user friendly" ways than Java implemented M/R jobs - at least
> for some cases.
>
> So this is not about interactive querying of data directly from HBase
> tables. It is rather about pre-processing HBase stored (large) data sets
> into either input to interactive query engines (some other DB, Phoenix,...)
> or into some other specialized format.
>
> I spent some time with HIVE but found that the HBase integration simply
> doesn't cut it (parsing a row key, mapping JSON column content). I know
> there is some more out there, but before spending an eternity trying out
> various methods, I am shamelessly trying to benefit from your expertise by
> asking for some good pointers.
>
> Thanks,
> Henning
>