You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hao Ren <in...@gmail.com> on 2015/07/21 11:09:22 UTC
DataFrame writer removes fields which is null for all rows
Consider the following code:
val df = Seq((1, 3), (2, 3)).toDF("key", "value").registerTempTable("tbl")
sqlContext.sql("select key, null as value from tbl")
.write.format("json").mode(SaveMode.Overwrite).save("test")
sqlContext.read.format("json").load("test").printSchema()
It shows:
root
|-- key: long (nullable = true)
The field `value` is removed from the schema when saving the DF to json
file, since it is null for all rows.
Saving to parquet file is the same. Null fields missed !
It seems that it's a default behavior for DF. But I would like to keep the
null fields for schema consistency.
Are there some options/configs to do for this purpose ?
Thx.
--
Hao Ren
Data Engineer @ leboncoin
Paris, France