You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/10/24 01:43:00 UTC

[jira] [Resolved] (SPARK-18180) pyspark.sql.Row does not serialize well to json

     [ https://issues.apache.org/jira/browse/SPARK-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-18180.
----------------------------------
    Resolution: Not A Bug

> pyspark.sql.Row does not serialize well to json
> -----------------------------------------------
>
>                 Key: SPARK-18180
>                 URL: https://issues.apache.org/jira/browse/SPARK-18180
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.0.1
>         Environment: HDP 2.3.4, Spark 2.0.1, 
>            Reporter: Miguel Cabrera
>            Priority: Major
>
> {{Row}} does not serialize well automatically. Although they are dict-like in Python, the json module does not see to be able to serialize it.
> {noformat}
> from  pyspark.sql import Row
> import json
> r = Row(field1='hello', field2='world')
> json.dumps(r)
> {noformat}
> Results:
> {noformat}
> '["hello", "world"]'
> {noformat}
> Expected:
> {noformat}
> {'field1':'hellow', 'field2':'world'}
> {noformat}
> The work around is to call the {{asDict()}} method of Row. However, this makes custom serializing of nested objects really painful as the person has to be aware that is serializing a Row object. In particular with SPARK-17695,   you cannot serialize DataFrames easily if you have some empty or null fields,  so you have to customize the serialization process. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org