You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/02/25 22:02:14 UTC
Ad-hoc reports against HBase - any way? any tools?
Hello,
I have a HBase cluster chock-full of data and would like to run canned reports
(i.e.,
reports known ahead of time), but also ad-hoc reports against that data.
Are there any open-source or commercial tools one can use?
Here's what I *think* I know so far, but please correct me wherever I wrong, so
I don't spread false info:
* Use HBase-Hive Integration
Pluses:
- lots of tools to query Hive are available
Minuses:
- data duplication
- Hive's copy of data is always behind
- I heard the integration is fairly alpha (e.g. you can't copy deltas to
Hive, you have to copy all data every time you want to update your Hive store)
* Use Pig
https://issues.apache.org/jira/browse/PIG-970
https://issues.apache.org/jira/browse/PIG-1205
Pluses:
- runs directly against HBase, no need to copy data
Minuses:
- PigLatin learning curve - in my case people wanting ad-hoc reports are not
techies
- No pretty front-end with syntax highlighting or visual querying or that
accepts SQL and translates it to PigLatin
* Use PigPen
Pluses:
- Visual == easy
Minuses:
- Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and
https://issues.apache.org/jira/browse/PIG-366
* Use Toad for Cloud
Pluses:
- accepts SQL, runs, and returns data
- runs directly against HBase, no need to copy data
Minuses:
- some people reported it crashes
- it allows the person querying the data to also modify the data, which is
bad in my environment
* Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be
able to get the
data out of Hive, but not out of HBase. More info below:
* Pentaho
* http://www.pentaho.com/products/hadoop/ - looks like it supports only Hive
* http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
* http://search-hadoop.com/?q=pentaho&src=moz-search
* Datameer
* http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks like
it
supports only Hive
* http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks like
one
can add support for HBase by writing a plugin?
Karmasphere Analyst
* http://www.karmasphere.com/Products-Information/karmasphere-analyst.html -
Hive only
Is any of the above incorrect?
Did I miss a tool, free or non-free, that I could use to run ad-hoc reports
against data in HBase?
Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Hadoop ecosystem search :: http://search-hadoop.com/
Re: Ad-hoc reports against HBase - any way? any tools?
Posted by Jean-Daniel Cryans <jd...@apache.org>.
We use the HBase+Hive integration here for ad-hoc queries, I don't
understand the data duplication you're talking about... when you
create an external table you can directly query your existing tables.
We run with the latest patch posted in HIVE-1634 since we have a lot
of binary values and I made a very very hacky patch to be able to use
our binary composite row keys.
I'll be happy to give you more details if you want to try going down that road.
J-D
On Fri, Feb 25, 2011 at 1:02 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> Hello,
>
> I have a HBase cluster chock-full of data and would like to run canned reports
> (i.e.,
>
> reports known ahead of time), but also ad-hoc reports against that data.
> Are there any open-source or commercial tools one can use?
>
> Here's what I *think* I know so far, but please correct me wherever I wrong, so
> I don't spread false info:
>
> * Use HBase-Hive Integration
> Pluses:
> - lots of tools to query Hive are available
> Minuses:
> - data duplication
> - Hive's copy of data is always behind
> - I heard the integration is fairly alpha (e.g. you can't copy deltas to
> Hive, you have to copy all data every time you want to update your Hive store)
>
> * Use Pig
> https://issues.apache.org/jira/browse/PIG-970
> https://issues.apache.org/jira/browse/PIG-1205
> Pluses:
> - runs directly against HBase, no need to copy data
> Minuses:
> - PigLatin learning curve - in my case people wanting ad-hoc reports are not
>
> techies
> - No pretty front-end with syntax highlighting or visual querying or that
> accepts SQL and translates it to PigLatin
>
> * Use PigPen
> Pluses:
> - Visual == easy
> Minuses:
> - Looks abandoned justing by http://search-hadoop.com/m/Noacz1MECC7 and
> https://issues.apache.org/jira/browse/PIG-366
>
> * Use Toad for Cloud
> Pluses:
> - accepts SQL, runs, and returns data
> - runs directly against HBase, no need to copy data
> Minuses:
> - some people reported it crashes
> - it allows the person querying the data to also modify the data, which is
> bad in my environment
>
> * Datameer DAS, Karmasphere Analyst, Pentaho, Beeswax -- they all seem to be
> able to get the
>
> data out of Hive, but not out of HBase. More info below:
>
> * Pentaho
> * http://www.pentaho.com/products/hadoop/ - looks like it supports only Hive
> * http://forums.pentaho.com/showthread.php?77926-HBase-and-ETL
> * http://search-hadoop.com/?q=pentaho&src=moz-search
>
> * Datameer
> * http://wiki.datameer.com/display/DAS1/DAS+Supported+Platforms - looks like
> it
>
> supports only Hive
> * http://wiki.datameer.com/display/DAS11/Using+the+Plug-in+SDK - looks like
> one
>
> can add support for HBase by writing a plugin?
>
> Karmasphere Analyst
> * http://www.karmasphere.com/Products-Information/karmasphere-analyst.html -
>
> Hive only
>
>
> Is any of the above incorrect?
> Did I miss a tool, free or non-free, that I could use to run ad-hoc reports
> against data in HBase?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>