You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Venki Korukanti (JIRA)" <ji...@apache.org> on 2014/08/26 19:25:59 UTC
[jira] [Commented] (HIVE-7843) orc_analyze.q fails with an
assertion in FileSinkOperator [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110983#comment-14110983 ]
Venki Korukanti commented on HIVE-7843:
---------------------------------------
Looks like the assertion is wrong here.
{code}
private String getDynPartDirectory(List<String> row, List<String> dpColNames, int numDynParts) {
assert row.size() == numDynParts && numDynParts == dpColNames.size() : "data length is different from num of DP columns";
...
}
{code}
Row size always contains the values for partition columns and bucket, but numDynParts only contains the number partition columns. So it always asserts when we do dynamic partition insert into a bucketed table.
Changed the assert to account for bucket, test goes past this assert but getting a new error.
{code}
assert numDynParts == dpColNames.size() &&
row.size() == numDynParts +
(conf.getDpSortState().equals(DPSortState.PARTITION_BUCKET_SORTED) ? 1 : 0) :
"data length is different from num of DP columns";
{code}
> orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
> ------------------------------------------------------------------------
>
> Key: HIVE-7843
> URL: https://issues.apache.org/jira/browse/HIVE-7843
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Affects Versions: spark-branch
> Reporter: Venki Korukanti
> Assignee: Venki Korukanti
> Labels: Spark-M1
> Fix For: spark-branch
>
>
> {code}
> java.lang.AssertionError: data length is different from num of DP columns
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
> org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
> org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
> org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)