You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "Ting(Goden) Yao" <ty...@pivotal.io> on 2016/01/05 19:17:36 UTC

Is Hive Index officially not recommended?

Hi,

We hit an issue when doing Hive testing to rebuild index on Tez.
We were told by our Hadoop distro vendor that it's not recommended (or
should avoid) using index with Hive.

But I don't see an official message on Hive wiki
<https://cwiki.apache.org/confluence/display/Hive/IndexDev> or
documentation.
Can someone confirm that so we'll ask our users to avoid indexing.

Thanks.
-Goden

==Exceptions (if you're interested in details) ==

Exception:

2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler]
event.AsyncDispatcher: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate
class with 1 arguments:
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
    at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80)
    at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887)
    at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69)
    ... 20 more
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154)
    at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95)
    ... 25 more
2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler]
impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1
with vertexId vertex_1449613300943_0002_1_00 at current state NEW
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid
event: V_START at NEW
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
    at java.lang.Thread.run(Thread.java:745)
2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler]
impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert

RE: Is Hive Index officially not recommended?

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

I don’t think Index on hive (as a separate entity) adds any value  although you can create one 

 

You can create an ORC table which will have characteristics that can simulate index like behaviour

 

CLUSTERED BY (object_id) INTO 256 BUCKETS

STORED AS ORC

TBLPROPERTIES ( "orc.compress"="SNAPPY",

"orc.create.index"="true",

"orc.bloom.filter.columns"="object_id",

 

That improves query response

 

 

HTH

 

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Ting(Goden) Yao [mailto:tyao@pivotal.io] 
Sent: 05 January 2016 18:18
To: user@hive.apache.org
Subject: Is Hive Index officially not recommended?

 

Hi,

 

We hit an issue when doing Hive testing to rebuild index on Tez.

We were told by our Hadoop distro vendor that it's not recommended (or should avoid) using index with Hive.

 

But I don't see an official message on Hive wiki <https://cwiki.apache.org/confluence/display/Hive/IndexDev>  or documentation.

Can someone confirm that so we'll ask our users to avoid indexing.

 

Thanks.

-Goden

 

==Exceptions (if you're interested in details) ==

Exception:

2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
    at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80)
    at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887)
    at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69)
    ... 20 more
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154)
    at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95)
    ... 25 more
2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with vertexId vertex_1449613300943_0002_1_00 at current state NEW
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_START at NEW
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
    at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
    at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
    at java.lang.Thread.run(Thread.java:745)
2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert

Re: Is Hive Index officially not recommended?

Posted by "Ting(Goden) Yao" <ty...@pivotal.io>.

yes. we tried mr and it works fine. so it's more likely a tez issue.
Thanks for your comments.

On Tue, Jan 5, 2016 at 11:58 AM Jörn Franke <jo...@gmail.com> wrote:

> You can still use execution Engine mr for maintaining the index. Indeed
> with the ORC or parquet format there are min/max indexes and bloom filters,
> but you need to sort your data appropriately to benefit from performance.
> Alternatively you can create redundant tables sorted in different order.
> The "traditional" indexes can still make sense for data not in Orc or
> parquet format.
> Keep in mind that for warehouse scenarios there are many other
> optimization methods in Hive.
>
> On 05 Jan 2016, at 19:17, Ting(Goden) Yao <ty...@pivotal.io> wrote:
>
> Hi,
>
> We hit an issue when doing Hive testing to rebuild index on Tez.
> We were told by our Hadoop distro vendor that it's not recommended (or
> should avoid) using index with Hive.
>
> But I don't see an official message on Hive wiki
> <https://cwiki.apache.org/confluence/display/Hive/IndexDev> or
> documentation.
> Can someone confirm that so we'll ask our users to avoid indexing.
>
> Thanks.
> -Goden
>
> ==Exceptions (if you're interested in details) ==
>
> Exception:
>
> 2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher: Error in dispatcher thread
> org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
>     at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80)
>     at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>     at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69)
>     ... 20 more
> Caused by: java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154)
>     at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110)
>     at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95)
>     ... 25 more
> 2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with vertexId vertex_1449613300943_0002_1_00 at current state NEW
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_START at NEW
>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>     at java.lang.Thread.run(Thread.java:745)
> 2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert
>
>

Re: Is Hive Index officially not recommended?

Posted by Lefty Leverenz <le...@gmail.com>.

I'd like to revise the Indexing
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Indexing> and
IndexDev <https://cwiki.apache.org/confluence/display/Hive/IndexDev> docs
in the wiki to include this information (as well as information from a
previous thread, if I can find it) so people won't be misled into using
indexes inappropriately.

But it might be more efficient for Gopal or another expert to do the
revisions.  Otherwise I would need careful reviews to make sure I don't
garble things.

-- Lefty

On Tue, Jan 5, 2016 at 3:55 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

>
> >So in a nutshell in Hive if "external" indexes are not used for improving
> >query response, what value they add and can we forget them for now?
>
> The builtin indexes - those that write data as smaller tables are only
> useful in a pre-columnar world, where the indexes offer a huge reduction
> in IO.
>
> Part #1 of using hive indexes effectively is to write your own
> HiveIndexHandler, with usesIndexTable=false;
>
> And then write a IndexPredicateAnalyzer, which lets you map arbitrary
> lookups into other range conditions.
>
> Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA"
> which consolidates the "internal" index into an external store (HBase).
>
> Some of the index data now lives in the HBase metastore, so that the
> inclusion/exclusion of whole partitions can be done off the consolidated
> index.
>
> https://issues.apache.org/jira/browse/HIVE-11676
>
>
> The experience from BI workloads run by customers is that in general, the
> lookup to the right "slice" of data is more of a problem than the actual
> aggregate.
>
> And that for a workhorse data warehouse, this has to survive even if
> there's a non-stop stream of updates into it.
>
> Cheers,
> Gopal
>
>
>

Re: Is Hive Index officially not recommended?

Posted by Gopal Vijayaraghavan <go...@apache.org>.

> Is anybody storing there index in a non-native table such as HBase?
...
> Can you please point to implementations of HiveIndexHandler or
>AbstractIndexHandler
> that have usesIndexTable=false

I don't think there are any publically available implementations yet.

The Hive HBase-metastore project adds a standardized HBase instance into
the mixture in hive-2.0.

We already moved the min-max indexes in ORC to the HBase metastore

https://issues.apache.org/jira/browse/HIVE-11676

+
https://issues.apache.org/jira/browse/HIVE-12075
+
https://issues.apache.org/jira/browse/HIVE-12061


I haven't really worked out how the aggregate indexes should work, but the
goal is to produce min-max indexes (then bloom filters).

The representative query (in my mind) looks somewhat like

UPDATE txns SET reversed=true where txn_id = 1;

where txns is partitioned by date.

Cheers,
Gopal

Re: Is Hive Index officially not recommended?

Posted by Amey Barve <am...@gmail.com>.

Hi Gopal,

As you suggested in your email above that


*Part #1 of using hive indexes effectively is to write your
ownHiveIndexHandler, with usesIndexTable=false;*


*And then write a IndexPredicateAnalyzer, which lets you map
arbitrarylookups into other range conditions.*

Is anybody storing there index in a non-native table such as HBase?

Can you please point to implementations of HiveIndexHandler or
AbstractIndexHandler
that have usesIndexTable=false

Thanks,
Amey

On Wed, Jan 6, 2016 at 5:25 AM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

>
> >So in a nutshell in Hive if "external" indexes are not used for improving
> >query response, what value they add and can we forget them for now?
>
> The builtin indexes - those that write data as smaller tables are only
> useful in a pre-columnar world, where the indexes offer a huge reduction
> in IO.
>
> Part #1 of using hive indexes effectively is to write your own
> HiveIndexHandler, with usesIndexTable=false;
>
> And then write a IndexPredicateAnalyzer, which lets you map arbitrary
> lookups into other range conditions.
>
> Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA"
> which consolidates the "internal" index into an external store (HBase).
>
> Some of the index data now lives in the HBase metastore, so that the
> inclusion/exclusion of whole partitions can be done off the consolidated
> index.
>
> https://issues.apache.org/jira/browse/HIVE-11676
>
>
> The experience from BI workloads run by customers is that in general, the
> lookup to the right "slice" of data is more of a problem than the actual
> aggregate.
>
> And that for a workhorse data warehouse, this has to survive even if
> there's a non-stop stream of updates into it.
>
> Cheers,
> Gopal
>
>
>

Re: Is Hive Index officially not recommended?

Posted by Gopal Vijayaraghavan <go...@apache.org>.

>So in a nutshell in Hive if "external" indexes are not used for improving
>query response, what value they add and can we forget them for now?

The builtin indexes - those that write data as smaller tables are only
useful in a pre-columnar world, where the indexes offer a huge reduction
in IO.

Part #1 of using hive indexes effectively is to write your own
HiveIndexHandler, with usesIndexTable=false;

And then write a IndexPredicateAnalyzer, which lets you map arbitrary
lookups into other range conditions.

Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA"
which consolidates the "internal" index into an external store (HBase).

Some of the index data now lives in the HBase metastore, so that the
inclusion/exclusion of whole partitions can be done off the consolidated
index. 

https://issues.apache.org/jira/browse/HIVE-11676


The experience from BI workloads run by customers is that in general, the
lookup to the right "slice" of data is more of a problem than the actual
aggregate.

And that for a workhorse data warehouse, this has to survive even if
there's a non-stop stream of updates into it.

Cheers,
Gopal

RE: Is Hive Index officially not recommended?

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Thanks Gopal for a very valuable insight.

So in a nutshell in Hive if "external" indexes are not used for improving
query response, what value they add and can we forget them for now?

Regards,

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf
Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.


-----Original Message-----
From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 05 January 2016 21:49
To: user@hive.apache.org
Subject: Re: Is Hive Index officially not recommended?


 
 
> I am going to run the same query in Hive. However, I only see a table 
>scan below and no mention of that index. May be I am missing something 
>here?

Hive Indexes are an incomplete feature, because they are not maintained over
an ACID storage & demand FileSystem access to check for validity.

I'm almost sure there's a better implementation, which never made it to
Apache (read HIVE-417 & comments about HBase).


So far, in all my prod cases, they've slowed down queries more often than
speeding them up.

By default, the indexes are *not* used to answer queries.

In fact, the slowness was mostly attributed to the time spent making sure
the index was invalid.

You can flip those on if you want mostly up-to date results.

set hive.optimize.index.filter=true;
set hive.optimize.index.groupby=true;

set hive.index.compact.query.max.size=-1;

set hive.optimize.index.filter.compact.minsize=-1;

set hive.index.compact.query.max.entries=-1;

Things are going to change in Hive-2.0 though. The addition of isolated
transactions brings new light into the world of indexes.

I'll be chasing that down after LLAP, since the txn model offers
serializability markers and the LockManager + compactions offer a great way
to purge/update them per-partition. And the metastore-2.0 removes a large
number of scalability problems associated with metadata.

 
Cheers,
Gopal

Re: Is Hive Index officially not recommended?

Posted by Gopal Vijayaraghavan <go...@apache.org>.

 
 
> I am going to run the same query in Hive. However, I only see a table
>scan below and no mention of that index. May be I am missing something
>here?

Hive Indexes are an incomplete feature, because they are not maintained
over an ACID storage & demand FileSystem access to check for validity.

I'm almost sure there's a better implementation, which never made it to
Apache (read HIVE-417 & comments about HBase).


So far, in all my prod cases, they've slowed down queries more often than
speeding them up.

By default, the indexes are *not* used to answer queries.

In fact, the slowness was mostly attributed to the time spent making sure
the index was invalid.

You can flip those on if you want mostly up-to date results.

set hive.optimize.index.filter=true;
set hive.optimize.index.groupby=true;

set hive.index.compact.query.max.size=-1;

set hive.optimize.index.filter.compact.minsize=-1;

set hive.index.compact.query.max.entries=-1;

Things are going to change in Hive-2.0 though. The addition of isolated
transactions brings new light into the world of indexes.

I'll be chasing that down after LLAP, since the txn model offers
serializability markers and the LockManager + compactions offer a great
way to purge/update them per-partition. And the metastore-2.0 removes a
large number of scalability problems associated with metadata.

 
Cheers,
Gopal

RE: Is Hive Index officially not recommended?

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.

Hi,

 

You point below:

 

The "traditional" indexes can still make sense for data not in Orc or parquet format.

 

Kindly consider below please

 

A traditional index in an RDBMs is normally a B-tree index with a value for that column and pointer (Row ID)to the row in the data block that keeps the data.

 

 

In RRDBMS I create a unique index on column OBJECT_ID on table ‘t’ below and do a simple query that can be covered by the index without touching the base table

 

1> select count(1) from t where OBJECT_ID < 100

2> go

 

QUERY PLAN FOR STATEMENT 1 (at line 1).

 

 

    STEP 1

        The type of query is EXECUTE.

        Executing a newly cached statement (SSQL_ID = 312036659).

 

Total estimated I/O cost for statement 1 (at line 1): 0.

 

 

QUERY PLAN FOR STATEMENT 1 (at line 0).

 

 

    STEP 1

        The type of query is DECLARE.

 

Total estimated I/O cost for statement 1 (at line 0): 0.

 

 

QUERY PLAN FOR STATEMENT 2 (at line 1).

Optimized using Parallel Mode

 

 

    STEP 1

        The type of query is SELECT.

 

        3 operator(s) under root

 

       |ROOT:EMIT Operator (VA = 3)

       |

       |   |SCALAR AGGREGATE Operator (VA = 2)

       |   |  Evaluate Ungrouped COUNT AGGREGATE.

       |   |

       |   |   |RESTRICT Operator (VA = 1)(3)(0)(0)(0)(0)

       |   |   |

       |   |   |   |SCAN Operator (VA = 0)

       |   |   |   |  FROM TABLE

       |   |   |   |  t

       |   |   |   |  Using Clustered Index.

       |   |   |   |  Index : t_ui

       |   |   |   |  Forward Scan.

       |   |   |   |  Positioning by key.

       |   |   |   |  Index contains all needed columns. Base table will not be read.

       |   |   |   |  Keys are:

       |   |   |   |    OBJECT_ID ASC

       |   |   |   |  Using I/O Size 64 Kbytes for index leaf pages.

       |   |   |   |  With LRU Buffer Replacement Strategy for index leaf pages.

 

 

Total estimated I/O cost for statement 2 (at line 1): 322792.

 

 

 

OK so no base table is touched

 

Let us do similar thing by creating an index on OBJECT_ID in that ‘t’ table imported from the said table and creaed in Hive

 

 

create index t_ui on table t (object_id) as 'COMPACT' WITH DEFERRED REBUILD;

alter index t_ui on t rebuild;

analyze table t compute statistics;

 

 

I am going to run the same query in Hive. However, I only see a table scan below and no mention of that index. May be I am missing something here?

 

0: jdbc:hive2://rhes564:10010/default> explain select count(1) from t where OBJECT_ID < 100;

+------------------------------------------------------------------------------------------------------------------+--+

|                                                     Explain                                                      |

+------------------------------------------------------------------------------------------------------------------+--+

| STAGE DEPENDENCIES:                                                                                              |

|   Stage-1 is a root stage                                                                                        |

|   Stage-0 depends on stages: Stage-1                                                                             |

|                                                                                                                  |

| STAGE PLANS:                                                                                                     |

|   Stage: Stage-1                                                                                                 |

|     Spark                                                                                                        |

|       Edges:                                                                                                     |

|         Reducer 2 <- Map 1 (GROUP, 1)                                                                            |

|       DagName: hduser_20160105203204_8d987e9a-415a-476a-8bad-b9a5010e36bf:54                                     |

|       Vertices:                                                                                                  |

|         Map 1                                                                                                    |

|             Map Operator Tree:                                                                                   |

|                 TableScan                                                                                        |

|                   alias: t                                                                                       |

|                   Statistics: Num rows: 2074897 Data size: 64438212 Basic stats: COMPLETE Column stats: NONE     |

|                   Filter Operator                                                                                |

|                     predicate: (object_id < 100) (type: boolean)                                                 |

|                     Statistics: Num rows: 691632 Data size: 21479393 Basic stats: COMPLETE Column stats: NONE    |

|                     Select Operator                                                                              |

|                       Statistics: Num rows: 691632 Data size: 21479393 Basic stats: COMPLETE Column stats: NONE  |

|                       Group By Operator                                                                          |

|                         aggregations: count(1)                                                                   |

|                         mode: hash                                                                               |

|                         outputColumnNames: _col0                                                                 |

|                         Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE            |

|                         Reduce Output Operator                                                                   |

|                           sort order:                                                                            |

|                           Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE          |

|                           value expressions: _col0 (type: bigint)                                                |

|         Reducer 2                                                                                                |

|             Reduce Operator Tree:                                                                                |

|               Group By Operator                                                                                  |

|                 aggregations: count(VALUE._col0)                                                                 |

|                 mode: mergepartial                                                                               |

|                 outputColumnNames: _col0                                                                         |

|                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                    |

|                 File Output Operator                                                                             |

|                   compressed: false                                                                              |

|                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE                  |

|                   table:                                                                                         |

|                       input format: org.apache.hadoop.mapred.TextInputFormat                                     |

|                       output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat                  |

|                       serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                  |

|                                                                                                                  |

|   Stage: Stage-0                                                                                                 |

|     Fetch Operator                                                                                               |

|       limit: -1                                                                                                  |

|       Processor Tree:                                                                                            |

|         ListSink                                                                                                 |

|                                                                                                                  |

+------------------------------------------------------------------------------------------------------------------+--+

 

 

Thanks

 

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.coHi, <https://www.linkedin.coHi,%0d%0dYour%20pointm/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> 

 <https://www.linkedin.coHi,%0d%0dYour%20pointm/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>  

Your pointm/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.coHi,%0d%0dYour%20pointm/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> 

 

Sybase ASE 15 Gold Medal Award 2008

A Winning Strategy: Running the most Critical Financial Data on ASE 15

http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> 

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Jörn Franke [mailto:jornfranke@gmail.com] 
Sent: 05 January 2016 19:59
To: user@hive.apache.org
Subject: Re: Is Hive Index officially not recommended?

 

Btw this is not Hive specific, but also for other relational database systems, such as Oracle Exadata.


On 05 Jan 2016, at 20:57, Jörn Franke <jornfranke@gmail.com <ma...@gmail.com> > wrote:

You can still use execution Engine mr for maintaining the index. Indeed with the ORC or parquet format there are min/max indexes and bloom filters, but you need to sort your data appropriately to benefit from performance. Alternatively you can create redundant tables sorted in different order.

The "traditional" indexes can still make sense for data not in Orc or parquet format.

Keep in mind that for warehouse scenarios there are many other optimization methods in Hive.

Re: Is Hive Index officially not recommended?

Posted by Jörn Franke <jo...@gmail.com>.

Btw this is not Hive specific, but also for other relational database systems, such as Oracle Exadata.

> On 05 Jan 2016, at 20:57, Jörn Franke <jo...@gmail.com> wrote:
> 
> You can still use execution Engine mr for maintaining the index. Indeed with the ORC or parquet format there are min/max indexes and bloom filters, but you need to sort your data appropriately to benefit from performance. Alternatively you can create redundant tables sorted in different order.
> The "traditional" indexes can still make sense for data not in Orc or parquet format.
> Keep in mind that for warehouse scenarios there are many other optimization methods in Hive.
> 
>> On 05 Jan 2016, at 19:17, Ting(Goden) Yao <ty...@pivotal.io> wrote:
>> 
>> Hi,
>> 
>> We hit an issue when doing Hive testing to rebuild index on Tez.
>> We were told by our Hadoop distro vendor that it's not recommended (or should avoid) using index with Hive.
>> 
>> But I don't see an official message on Hive wiki or documentation.
>> Can someone confirm that so we'll ask our users to avoid indexing.
>> 
>> Thanks.
>> -Goden
>> 
>> ==Exceptions (if you're interested in details) ==
>> Exception:
>> 
>> 2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher: Error in dispatcher thread
>> org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
>>     at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80)
>>     at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
>>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
>>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887)
>>     at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>>     at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
>>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
>>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
>>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>>     at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>     at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69)
>>     ... 20 more
>> Caused by: java.lang.NullPointerException
>>     at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154)
>>     at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110)
>>     at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95)
>>     ... 25 more
>> 2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with vertexId vertex_1449613300943_0002_1_00 at current state NEW
>> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_START at NEW
>>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>>     at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
>>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
>>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
>>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
>>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>>     at java.lang.Thread.run(Thread.java:745)
>> 2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert

Re: Is Hive Index officially not recommended?

Posted by Jörn Franke <jo...@gmail.com>.

You can still use execution Engine mr for maintaining the index. Indeed with the ORC or parquet format there are min/max indexes and bloom filters, but you need to sort your data appropriately to benefit from performance. Alternatively you can create redundant tables sorted in different order.
The "traditional" indexes can still make sense for data not in Orc or parquet format.
Keep in mind that for warehouse scenarios there are many other optimization methods in Hive.

> On 05 Jan 2016, at 19:17, Ting(Goden) Yao <ty...@pivotal.io> wrote:
> 
> Hi,
> 
> We hit an issue when doing Hive testing to rebuild index on Tez.
> We were told by our Hadoop distro vendor that it's not recommended (or should avoid) using index with Hive.
> 
> But I don't see an official message on Hive wiki or documentation.
> Can someone confirm that so we'll ask our users to avoid indexing.
> 
> Thanks.
> -Goden
> 
> ==Exceptions (if you're interested in details) ==
> Exception:
> 
> 2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher: Error in dispatcher thread
> org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
>     at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80)
>     at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137)
>     at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>     at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69)
>     ... 20 more
> Caused by: java.lang.NullPointerException
>     at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154)
>     at org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110)
>     at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95)
>     ... 25 more
> 2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with vertexId vertex_1449613300943_0002_1_00 at current state NEW
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: V_START at NEW
>     at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>     at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>     at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>     at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556)
>     at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764)
>     at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>     at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>     at java.lang.Thread.run(Thread.java:745)
> 2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert