You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hao Ren <in...@gmail.com> on 2015/07/21 11:09:22 UTC

DataFrame writer removes fields which is null for all rows

Consider the following code:

val df = Seq((1, 3), (2, 3)).toDF("key", "value").registerTempTable("tbl")

sqlContext.sql("select key, null as value from tbl")
  .write.format("json").mode(SaveMode.Overwrite).save("test")

sqlContext.read.format("json").load("test").printSchema()

It shows:

root
 |-- key: long (nullable = true)

The field `value` is removed from the schema when saving the DF to json
file, since it is null for all rows.
Saving to parquet file is the same. Null fields missed !

It seems that it's a default behavior for DF. But I would like to keep the
null fields for schema consistency.

Are there some options/configs to do for this purpose ?

Thx.

-- 
Hao Ren

Data Engineer @ leboncoin

Paris, France