You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Venki Korukanti (JIRA)" <ji...@apache.org> on 2014/08/26 19:25:59 UTC
[jira] [Commented] (HIVE-7843) orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]

    [ https://issues.apache.org/jira/browse/HIVE-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110983#comment-14110983 ] 

Venki Korukanti commented on HIVE-7843:
---------------------------------------

Looks like the assertion is wrong here.

{code}
private String getDynPartDirectory(List<String> row, List<String> dpColNames, int numDynParts) {
  assert row.size() == numDynParts && numDynParts == dpColNames.size() : "data length is different from num of DP columns";
  ...
}
{code}

Row size always contains the values for partition columns and bucket, but numDynParts only contains the number partition columns. So it always asserts when we do dynamic partition insert into a bucketed table.

Changed the assert to account for bucket, test goes past this assert but getting a new error.

{code}
    assert numDynParts == dpColNames.size() &&
        row.size() == numDynParts +
            (conf.getDpSortState().equals(DPSortState.PARTITION_BUCKET_SORTED) ? 1 : 0) :
        "data length is different from num of DP columns";
{code}

> orc_analyze.q fails with an assertion in FileSinkOperator [Spark Branch]
> ------------------------------------------------------------------------
>
>                 Key: HIVE-7843
>                 URL: https://issues.apache.org/jira/browse/HIVE-7843
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Venki Korukanti
>            Assignee: Venki Korukanti
>              Labels: Spark-M1
>             Fix For: spark-branch
>
>
> {code}
> java.lang.AssertionError: data length is different from num of DP columns
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynPartDirectory(FileSinkOperator.java:809)
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:730)
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.startGroup(FileSinkOperator.java:829)
> org.apache.hadoop.hive.ql.exec.Operator.defaultStartGroup(Operator.java:502)
> org.apache.hadoop.hive.ql.exec.Operator.startGroup(Operator.java:525)
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:198)
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:47)
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:27)
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
> org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:759)
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)