You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Charlie Evans (JIRA)" <ji...@apache.org> on 2016/07/05 12:57:10 UTC

[jira] [Reopened] (SPARK-15804) Manually added metadata not saving with parquet

     [ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charlie Evans reopened SPARK-15804:
-----------------------------------

Thanks for the fix. Whilst the example provided is resolved another similar bug has come up when creating a dataframe from RDD of Rows:

{code}
import org.apache.spark.sql.Row
val x = Row(1.0) :: Row(2.0) :: Row(3.0) :: Nil
import org.apache.spark.sql.types._
val schema = StructType(StructField("A", DoubleType) :: Nil)
val df = spark.createDataFrame(sc.parallelize(x), schema)
import org.apache.spark.sql.types.MetadataBuilder
val md = new MetadataBuilder().putString("key", "value").build()
val dfWithMetadata = df.select(col("A").as("A", md))
println(dfWithMetadata.schema.json)

dfWithMetadata.write.mode("overwrite").parquet("dfWithMetadata")
val dfWithMetadata2 = spark.read.parquet("dfWithMetadata")
println(dfWithMetadata2.schema.json)
{code}

> Manually added metadata not saving with parquet
> -----------------------------------------------
>
>                 Key: SPARK-15804
>                 URL: https://issues.apache.org/jira/browse/SPARK-15804
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Charlie Evans
>            Assignee: kevin yu
>             Fix For: 2.0.0
>
>
> Adding metadata with col().as(_, metadata) then saving the resultant dataframe does not save the metadata. No error is thrown. Only see the schema contains the metadata before saving and does not contain the metadata after saving and loading the dataframe. Was working fine with 1.6.1.
> {code}
> case class TestRow(a: String, b: Int)
> val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil
> val df = spark.createDataFrame(rows)
> import org.apache.spark.sql.types.MetadataBuilder
> val md = new MetadataBuilder().putString("key", "value").build()
> val dfWithMeta = df.select(col("a"), col("b").as("b", md))
> println(dfWithMeta.schema.json)
> dfWithMeta.write.parquet("dfWithMeta")
> val dfWithMeta2 = spark.read.parquet("dfWithMeta")
> println(dfWithMeta2.schema.json)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org