You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Szehon Ho (JIRA)" <ji...@apache.org> on 2016/03/19 01:33:33 UTC

[jira] [Created] (HIVE-13314) Hive on spark mapjoin errors if spark.master is not set

Szehon Ho created HIVE-13314:
--------------------------------

             Summary: Hive on spark mapjoin errors if spark.master is not set
                 Key: HIVE-13314
                 URL: https://issues.apache.org/jira/browse/HIVE-13314
             Project: Hive
          Issue Type: Bug
          Components: Spark
            Reporter: Szehon Ho
            Assignee: Szehon Ho
            Priority: Minor


There are some errors that happen if spark.master is not set.

This is despite the code defaulting to yarn-cluster if spark.master is not set by user or on the config files: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L51]

The funny thing is that while it works the first time due to this default, subsequent tries will fail as the hiveConf is refreshed without that default being set.

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java#L180]

Exception is follows:
{noformat}
Job aborted due to stage failure: Task 40 in stage 1.0 failed 4 times, most recent failure: Lost task 40.3 in stage 1.0 (TID 22, d2409.halxg.cloudera.com): java.lang.RuntimeException: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154)
	at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
	at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
	at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
	at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120)
	at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
	at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2003)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:117)
	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:197)
	at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
	at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
	at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
	... 16 more
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.isDedicatedCluster(SparkUtilities.java:108)
	at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:124)
	at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:114)
	... 24 more

Driver stacktrace:
{noformat}

The issue is 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)