You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2016/11/14 05:44:59 UTC

[jira] [Resolved] (HIVE-15189) No base file for ACID table

     [ https://issues.apache.org/jira/browse/HIVE-15189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates resolved HIVE-15189.
-------------------------------
    Resolution: Not A Bug

> No base file for ACID table
> ---------------------------
>
>                 Key: HIVE-15189
>                 URL: https://issues.apache.org/jira/browse/HIVE-15189
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.2.1
>         Environment: HDP 2.4, HDP2.5
>            Reporter: Benjamin BONNET
>
> Hi,
> When one creates a new ACID table and inserts data into it using INSERT INTO, Hive does not write a 'base' file : it only creates a delta. That may lead to two issues (at least):
> # when you try to read it, you might get  a 'serious problem' like that :
> {noformat}
> java.lang.RuntimeException: serious problem
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
>         at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
>         at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>         at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
>         at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
>         at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
>         at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:635)
>         at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:64)
>         at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:311)
>         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
>         at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
>         at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_0000000_0000000 does not start with base_
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998)
>         ... 44 more
> {noformat}
> I do not always get this error, but when I get it, I need to drop/recreate table.
> # Spark-sql does not see data as long as there is no base file. So as long as compaction has not occurred, no data can be read using Spar-SQL. See https://issues.apache.org/jira/browse/SPARK-16996
> I know Hive always creates a base file when ou INSERT OVERWRITE, but you cannot always use OVERWRITE instead of INTO : in my use case, I use a statement that writes data into partitions that already exist and partitions that do not exist yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)