You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "KaiXu (JIRA)" <ji...@apache.org> on 2017/02/24 09:30:44 UTC

[jira] [Commented] (SPARK-19725) different parquet dependency in spark2.x and Hive2.x cause failure of HoS when using parquet file format

    [ https://issues.apache.org/jira/browse/SPARK-19725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882311#comment-15882311 ] 

KaiXu commented on SPARK-19725:
-------------------------------

using parquet-provided profile can workaround this issue, but it's better to sync them, so here labeled as improvement.

> different parquet dependency in spark2.x and Hive2.x cause failure of HoS when using parquet file format
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19725
>                 URL: https://issues.apache.org/jira/browse/SPARK-19725
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: spark2.0.2
> hive2.2
> hadoop2.7.1
>            Reporter: KaiXu
>
> the parquet version in hive2.x is 1.8.1 while in spark2.x is 1.7.0, so when run HoS queries using parquet file format would encounter some jars conflict problems:
> Starting Spark Job = d1f6825c-48ea-45b8-9614-4266f2d1f0bd
> Job failed with java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
> FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.util.concurrent.ExecutionException: Exception thrown by job
>         at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:272)
>         at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277)
>         at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
>         at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (TID 9, hsx-node7): java.lang.RuntimeException: Error processing row: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
>         at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:149)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>         at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:85)
>         at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>         at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>         at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127)
>         at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
>         at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:1976)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>         at org.apache.spark.scheduler.Task.run(Task.scala:86)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$PrimitiveBuilder.length(I)Lorg/apache/parquet/schema/Types$BasePrimitiveBuilder;
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:100)
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertType(HiveSchemaConverter.java:56)
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convertTypes(HiveSchemaConverter.java:50)
>         at org.apache.hadoop.hive.ql.io.parquet.convert.HiveSchemaConverter.convert(HiveSchemaConverter.java:39)
>         at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:115)
>         at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:286)
>         at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:271)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
>         at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>         at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:137)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org