You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2014/11/23 05:14:12 UTC
[jira] [Updated] (SPARK-4561) PySparkSQL's Row.asDict() should
convert nested rows to dictionaries
[ https://issues.apache.org/jira/browse/SPARK-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-4561:
------------------------------
Description:
In PySpark, you can call {{.asDict
()}} on a SparkSQL {{Row}} to convert it to a dictionary. Unfortunately, though, this does not convert nested rows to dictionaries. For example:
{code}
>>> sqlContext.sql("select results from results").first()
Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), Row(time=3.239), Row(time=3.149)])
>>> sqlContext.sql("select results from results").first().asDict()
{u'results': [(3.762,),
(3.47,),
(3.559,),
(3.458,),
(3.229,),
(3.21,),
(3.166,),
(3.276,),
(3.239,),
(3.149,)]}
{code}
Actually, it looks like the nested fields are just left as Rows (IPython's fancy display logic obscured this in my first example):
{code}
>>> Row(results=[Row(time=1), Row(time=2)]).asDict()
{'results': [Row(time=1), Row(time=2)]}
{code}
I ran into this issue when trying to use Pandas dataframes to display nested data that I queried from Spark SQL.
was:
In PySpark, you can call {{.asDict
()}} on a SparkSQL {{Row}} to convert it to a dictionary. Unfortunately, though, this does not convert nested rows to dictionaries. For example:
{code}
>>> sqlContext.sql("select results from results").first()
Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), Row(time=3.239), Row(time=3.149)])
>>> sqlContext.sql("select results from results").first().asDict()
{u'results': [(3.762,),
(3.47,),
(3.559,),
(3.458,),
(3.229,),
(3.21,),
(3.166,),
(3.276,),
(3.239,),
(3.149,)]}
{code}
I ran into this issue when trying to use Pandas dataframes to display nested data that I queried from Spark SQL.
> PySparkSQL's Row.asDict() should convert nested rows to dictionaries
> --------------------------------------------------------------------
>
> Key: SPARK-4561
> URL: https://issues.apache.org/jira/browse/SPARK-4561
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SQL
> Affects Versions: 1.2.0
> Reporter: Josh Rosen
>
> In PySpark, you can call {{.asDict
> ()}} on a SparkSQL {{Row}} to convert it to a dictionary. Unfortunately, though, this does not convert nested rows to dictionaries. For example:
> {code}
> >>> sqlContext.sql("select results from results").first()
> Row(results=[Row(time=3.762), Row(time=3.47), Row(time=3.559), Row(time=3.458), Row(time=3.229), Row(time=3.21), Row(time=3.166), Row(time=3.276), Row(time=3.239), Row(time=3.149)])
> >>> sqlContext.sql("select results from results").first().asDict()
> {u'results': [(3.762,),
> (3.47,),
> (3.559,),
> (3.458,),
> (3.229,),
> (3.21,),
> (3.166,),
> (3.276,),
> (3.239,),
> (3.149,)]}
> {code}
> Actually, it looks like the nested fields are just left as Rows (IPython's fancy display logic obscured this in my first example):
> {code}
> >>> Row(results=[Row(time=1), Row(time=2)]).asDict()
> {'results': [Row(time=1), Row(time=2)]}
> {code}
> I ran into this issue when trying to use Pandas dataframes to display nested data that I queried from Spark SQL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org