You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2016/04/01 19:29:25 UTC
[jira] [Created] (SPARK-14331) Exceptions saving to parquetFile
after join from dataframes in master
Thomas Graves created SPARK-14331:
-------------------------------------
Summary: Exceptions saving to parquetFile after join from dataframes in master
Key: SPARK-14331
URL: https://issues.apache.org/jira/browse/SPARK-14331
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.0.0
Reporter: Thomas Graves
Priority: Critical
I'm trying to use master and write to a parquet file when using a dataframe but am seeing the exception below. Not sure exact state of dataframes right now so if this is known issue let me know.
I read 2 sources of parquet files, joined them, then saved them back.
val df_pixels = sqlContext.read.parquet("data1")
val df_pixels_renamed = df_pixels.withColumnRenamed("photo_id", "pixels_photo_id")
val df_meta = sqlContext.read.parquet("data2")
val df = df_meta.as("meta").join(df_pixels_renamed, $"meta.photo_id" === $"pixels_photo_id", "inner").drop("pixels_photo_id")
df.write.parquet(args(0))
16/04/01 17:21:34 ERROR InsertIntoHadoopFsRelation: Aborting job.
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(pixels_photo_id#3, 20000), None
+- WholeStageCodegen
: +- Filter isnotnull(pixels_photo_id#3)
: +- INPUT
+- Coalesce 0
+- WholeStageCodegen
: +- Project [img_data#0,photo_id#1 AS pixels_photo_id#3]
: +- Scan HadoopFiles[img_data#0,photo_id#1] Format: ParquetFormat, PushedFilters: [], ReadSchema: struct<img_data:binary,photo_id:string>
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:109)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
at org.apache.spark.sql.execution.InputAdapter.upstreams(WholeStageCodegen.scala:236)
at org.apache.spark.sql.execution.Sort.upstreams(Sort.scala:104)
at org.apache.spark.sql.execution.WholeStageCodegen.doExecute(WholeStageCodegen.scala:351)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
at org.apache.spark.sql.execution.InputAdapter.doExecute(WholeStageCodegen.scala:228)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:118)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:137)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:134)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:117)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org