You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Heinrich van den Heever (JIRA)" <ji...@apache.org> on 2018/02/22 08:26:00 UTC

[jira] [Commented] (SPARK-15516) Schema merging in driver fails for parquet when merging LongType and IntegerType

    [ https://issues.apache.org/jira/browse/SPARK-15516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372536#comment-16372536 ] 

Heinrich van den Heever commented on SPARK-15516:
-------------------------------------------------

Hi All

I am fairly new to Spark so apologies in advance if I missed something.

We are currently experiencing the same issue using the mergeSchema function when trying to merge an int and long type schema from separate parquets.

Has there been any progress as to why the function does not play along when merging int and long (as far as I know the columns we are trying to merge are not Key columns), or should we use the proposed solution from Min-Fu Yang for the time being?

> Schema merging in driver fails for parquet when merging LongType and IntegerType
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-15516
>                 URL: https://issues.apache.org/jira/browse/SPARK-15516
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Databricks
>            Reporter: Hossein Falaki
>            Priority: Major
>
> I tried to create a table from partitioned parquet directories that requires schema merging. I get following error:
> {code}
> at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24$$anonfun$apply$9.apply(ParquetRelation.scala:831)
>     at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24$$anonfun$apply$9.apply(ParquetRelation.scala:826)
>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24.apply(ParquetRelation.scala:826)
>     at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24.apply(ParquetRelation.scala:801)
>     at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756)
>     at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>     at org.apache.spark.scheduler.Task.run(Task.scala:85)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Failed to merge incompatible data types LongType and IntegerType
>     at org.apache.spark.sql.types.StructType$.merge(StructType.scala:462)
>     at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:420)
>     at org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:418)
>     at scala.Option.map(Option.scala:145)
>     at org.apache.spark.sql.types.StructType$$anonfun$merge$1.apply(StructType.scala:418)
>     at org.apache.spark.sql.types.StructType$$anonfun$merge$1.apply(StructType.scala:415)
>     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>     at org.apache.spark.sql.types.StructType$.merge(StructType.scala:415)
>     at org.apache.spark.sql.types.StructType.merge(StructType.scala:333)
>     at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24$$anonfun$apply$9.apply(ParquetRelation.scala:829)
> {code}
> cc @rxin and [~mengxr]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org