You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/10/24 01:43:00 UTC
[jira] [Resolved] (SPARK-18180) pyspark.sql.Row does not serialize
well to json
[ https://issues.apache.org/jira/browse/SPARK-18180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-18180.
----------------------------------
Resolution: Not A Bug
> pyspark.sql.Row does not serialize well to json
> -----------------------------------------------
>
> Key: SPARK-18180
> URL: https://issues.apache.org/jira/browse/SPARK-18180
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.0.1
> Environment: HDP 2.3.4, Spark 2.0.1,
> Reporter: Miguel Cabrera
> Priority: Major
>
> {{Row}} does not serialize well automatically. Although they are dict-like in Python, the json module does not see to be able to serialize it.
> {noformat}
> from pyspark.sql import Row
> import json
> r = Row(field1='hello', field2='world')
> json.dumps(r)
> {noformat}
> Results:
> {noformat}
> '["hello", "world"]'
> {noformat}
> Expected:
> {noformat}
> {'field1':'hellow', 'field2':'world'}
> {noformat}
> The work around is to call the {{asDict()}} method of Row. However, this makes custom serializing of nested objects really painful as the person has to be aware that is serializing a Row object. In particular with SPARK-17695, you cannot serialize DataFrames easily if you have some empty or null fields, so you have to customize the serialization process.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org