You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Amey Barve <am...@gmail.com> on 2016/02/04 11:39:48 UTC
Re: Is Hive Index officially not recommended?
Hi Gopal,
As you suggested in your email above that
*Part #1 of using hive indexes effectively is to write your
ownHiveIndexHandler, with usesIndexTable=false;*
*And then write a IndexPredicateAnalyzer, which lets you map
arbitrarylookups into other range conditions.*
Is anybody storing there index in a non-native table such as HBase?
Can you please point to implementations of HiveIndexHandler or
AbstractIndexHandler
that have usesIndexTable=false
Thanks,
Amey
On Wed, Jan 6, 2016 at 5:25 AM, Gopal Vijayaraghavan <go...@apache.org>
wrote:
>
> >So in a nutshell in Hive if "external" indexes are not used for improving
> >query response, what value they add and can we forget them for now?
>
> The builtin indexes - those that write data as smaller tables are only
> useful in a pre-columnar world, where the indexes offer a huge reduction
> in IO.
>
> Part #1 of using hive indexes effectively is to write your own
> HiveIndexHandler, with usesIndexTable=false;
>
> And then write a IndexPredicateAnalyzer, which lets you map arbitrary
> lookups into other range conditions.
>
> Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA"
> which consolidates the "internal" index into an external store (HBase).
>
> Some of the index data now lives in the HBase metastore, so that the
> inclusion/exclusion of whole partitions can be done off the consolidated
> index.
>
> https://issues.apache.org/jira/browse/HIVE-11676
>
>
> The experience from BI workloads run by customers is that in general, the
> lookup to the right "slice" of data is more of a problem than the actual
> aggregate.
>
> And that for a workhorse data warehouse, this has to survive even if
> there's a non-stop stream of updates into it.
>
> Cheers,
> Gopal
>
>
>
Re: Is Hive Index officially not recommended?
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Is anybody storing there index in a non-native table such as HBase?
...
> Can you please point to implementations of HiveIndexHandler or
>AbstractIndexHandler
> that have usesIndexTable=false
I don't think there are any publically available implementations yet.
The Hive HBase-metastore project adds a standardized HBase instance into
the mixture in hive-2.0.
We already moved the min-max indexes in ORC to the HBase metastore
https://issues.apache.org/jira/browse/HIVE-11676
+
https://issues.apache.org/jira/browse/HIVE-12075
+
https://issues.apache.org/jira/browse/HIVE-12061
I haven't really worked out how the aggregate indexes should work, but the
goal is to produce min-max indexes (then bloom filters).
The representative query (in my mind) looks somewhat like
UPDATE txns SET reversed=true where txn_id = 1;
where txns is partitioned by date.
Cheers,
Gopal