You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jonathan (Jira)" <ji...@apache.org> on 2019/10/26 18:43:00 UTC

[jira] [Updated] (SPARK-29610) Keys with Null values are discarded when using to_json function

     [ https://issues.apache.org/jira/browse/SPARK-29610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan updated SPARK-29610:
-----------------------------
    Description: 
When calling to_json on a Struct if a key has Null as a value then the key is thrown away.
{code:java}
import pyspark
import pyspark.sql.functions as F

l = [("a", "foo"), ("b", None)]

df = spark.createDataFrame(l, ["id", "data"])
(
  df.select(F.struct("*").alias("payload"))
    .withColumn("payload", 
      F.to_json(F.col("payload"))
    ).select("payload")
    .show()
){code}
Produces the following output:
{noformat}
+--------------------+
|             payload|
+--------------------+
|{"id":"a","data":...|
|          {"id":"b"}|
+--------------------+{noformat}
The `data` key in the second row has just been silently deleted.

  was:
When calling to_json on a Struct if a key has Null as a value then the key is thrown away.
{code:java}
import pyspark
import pyspark.sql.functions as F
l = [("a", "foo"), ("b", None)]
df = spark.createDataFrame(l, ["id", "data"]) 
(
  df.select(F.struct("*").alias("payload"))
    .withColumn("payload", 
      F.to_json(F.col("payload"))
    ).select("payload")
    .show()
){code}
Produces the following output:
{noformat}
+--------------------+
|             payload|
+--------------------+
|{"id":"a","data":...|
|          {"id":"b"}|
+--------------------+{noformat}
The `data` key in the second row has just been silently deleted.


> Keys with Null values are discarded when using to_json function
> ---------------------------------------------------------------
>
>                 Key: SPARK-29610
>                 URL: https://issues.apache.org/jira/browse/SPARK-29610
>             Project: Spark
>          Issue Type: Bug
>          Components: Build
>    Affects Versions: 2.4.4
>            Reporter: Jonathan
>            Priority: Major
>
> When calling to_json on a Struct if a key has Null as a value then the key is thrown away.
> {code:java}
> import pyspark
> import pyspark.sql.functions as F
> l = [("a", "foo"), ("b", None)]
> df = spark.createDataFrame(l, ["id", "data"])
> (
>   df.select(F.struct("*").alias("payload"))
>     .withColumn("payload", 
>       F.to_json(F.col("payload"))
>     ).select("payload")
>     .show()
> ){code}
> Produces the following output:
> {noformat}
> +--------------------+
> |             payload|
> +--------------------+
> |{"id":"a","data":...|
> |          {"id":"b"}|
> +--------------------+{noformat}
> The `data` key in the second row has just been silently deleted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org