You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:21:27 UTC

[jira] [Updated] (SPARK-15804) Manually added metadata not saving with parquet

     [ https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-15804:
---------------------------------
    Labels: bulk-closed  (was: )

> Manually added metadata not saving with parquet
> -----------------------------------------------
>
>                 Key: SPARK-15804
>                 URL: https://issues.apache.org/jira/browse/SPARK-15804
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Charlie Evans
>            Assignee: kevin yu
>            Priority: Major
>              Labels: bulk-closed
>
> Adding metadata with col().as(_, metadata) then saving the resultant dataframe does not save the metadata. No error is thrown. Only see the schema contains the metadata before saving and does not contain the metadata after saving and loading the dataframe. Was working fine with 1.6.1.
> {code}
> case class TestRow(a: String, b: Int)
> val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil
> val df = spark.createDataFrame(rows)
> import org.apache.spark.sql.types.MetadataBuilder
> val md = new MetadataBuilder().putString("key", "value").build()
> val dfWithMeta = df.select(col("a"), col("b").as("b", md))
> println(dfWithMeta.schema.json)
> dfWithMeta.write.parquet("dfWithMeta")
> val dfWithMeta2 = spark.read.parquet("dfWithMeta")
> println(dfWithMeta2.schema.json)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org