You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/02/25 22:02:14 UTC

Ad-hoc reports against HBase - any way? any tools?

Hello,

I have a HBase cluster chock-full of data and would like to run canned reports 
(i.e., 

reports known ahead of time), but also ad-hoc reports against that data.
Are there any open-source or commercial tools one can use?

Here's what I *think* I know so far, but please correct me wherever I wrong, so 
I don't spread false info:

* Use HBase-Hive Integration
  Pluses:
    - lots of tools to query Hive are available
  Minuses:
    - data duplication
    - Hive's copy of data is always behind
    - I heard the integration is fairly alpha (e.g. you can't copy deltas to 
Hive, you have to copy all data every time you want to update your Hive store)

* Use Pig 
  https://issues.apache.org/jira/browse/PIG-970
  https://issues.apache.org/jira/browse/PIG-1205
  Pluses:
    - runs directly against HBase, no need to copy data
  Minuses:
    - PigLatin learning curve - in my case people wanting ad-hoc reports are not 

techies
    - No pretty front-end with syntax highlighting or visual querying or that 
accepts SQL and translates it to PigLatin

* Use PigPen
  Pluses:
    - Visual == easy
  Minuses:
    - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and 
https://issues.apache.org/jira/browse/PIG-366

* Use Toad for Cloud
  Pluses:
    - accepts SQL, runs, and returns data
    - runs directly against HBase, no need to copy data
  Minuses:
    - some people reported it crashes
    - it allows the person querying the data to also modify the data, which is 
bad in my environment

* Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be 
able to get the 

data out of Hive, but not out of HBase.  More info below:

* Pentaho
    * http://www.pentaho.com/products/hadoop/ - looks like it supports only Hive
    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
    * http://search-hadoop.com/?q=pentaho&src=moz-search

* Datameer
    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks like 
it 

supports only Hive
    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks like 
one 

can add support for HBase by writing a plugin?

Karmasphere Analyst
    * http://www.karmasphere.com/Products-Information/karmasphere-analyst.html - 

Hive only


Is any of the above incorrect?
Did I miss a tool, free or non-free, that I could use to run ad-hoc reports 
against data in HBase?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/

Re: Ad-hoc reports against HBase - any way? any tools?

Posted by Jean-Daniel Cryans <jd...@apache.org>.

We use the HBase+Hive integration here for ad-hoc queries, I don't
understand the data duplication you're talking about... when you
create an external table you can directly query your existing tables.
We run with the latest patch posted in HIVE-1634 since we have a lot
of binary values and I made a very very hacky patch to be able to use
our binary composite row keys.

I'll be happy to give you more details if you want to try going down that road.

J-D

On Fri, Feb 25, 2011 at 1:02 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hello,
>
> I have a HBase cluster chock-full of data and would like to run canned reports
> (i.e.,
>
> reports known ahead of time), but also ad-hoc reports against that data.
> Are there any open-source or commercial tools one can use?
>
> Here's what I *think* I know so far, but please correct me wherever I wrong, so
> I don't spread false info:
>
> * Use HBase-Hive Integration
>  Pluses:
>    - lots of tools to query Hive are available
>  Minuses:
>    - data duplication
>    - Hive's copy of data is always behind
>    - I heard the integration is fairly alpha (e.g. you can't copy deltas to
> Hive, you have to copy all data every time you want to update your Hive store)
>
> * Use Pig
>  https://issues.apache.org/jira/browse/PIG-970
>  https://issues.apache.org/jira/browse/PIG-1205
>  Pluses:
>    - runs directly against HBase, no need to copy data
>  Minuses:
>    - PigLatin learning curve - in my case people wanting ad-hoc reports are not
>
> techies
>    - No pretty front-end with syntax highlighting or visual querying or that
> accepts SQL and translates it to PigLatin
>
> * Use PigPen
>  Pluses:
>    - Visual == easy
>  Minuses:
>    - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and
> https://issues.apache.org/jira/browse/PIG-366
>
> * Use Toad for Cloud
>  Pluses:
>    - accepts SQL, runs, and returns data
>    - runs directly against HBase, no need to copy data
>  Minuses:
>    - some people reported it crashes
>    - it allows the person querying the data to also modify the data, which is
> bad in my environment
>
> * Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be
> able to get the
>
> data out of Hive, but not out of HBase.  More info below:
>
> * Pentaho
>    * http://www.pentaho.com/products/hadoop/ - looks like it supports only Hive
>    * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
>    * http://search-hadoop.com/?q=pentaho&src=moz-search
>
> * Datameer
>    * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks like
> it
>
> supports only Hive
>    * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks like
> one
>
> can add support for HBase by writing a plugin?
>
> Karmasphere Analyst
>    * http://www.karmasphere.com/Products-Information/karmasphere-analyst.html -
>
> Hive only
>
>
> Is any of the above incorrect?
> Did I miss a tool, free or non-free, that I could use to run ad-hoc reports
> against data in HBase?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>