You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Davies Liu <da...@databricks.com> on 2014/12/31 02:40:14 UTC

Re: Help, pyspark.sql.List flatMap results become tuple

This should be fixed in 1.2, could you try it?

On Mon, Dec 29, 2014 at 8:04 PM, guoxu1231 <gu...@gmail.com> wrote:
> Hi pyspark guys,
>
> I have a json file, and its struct like below:
>
> {"NAME":"George", "AGE":35, "ADD_ID":1212, "POSTAL_AREA":1,
> "TIME_ZONE_ID":1, "INTEREST":[{"INTEREST_NO":1, "INFO":"x"},
> {"INTEREST_NO":2, "INFO":"y"}]}
> {"NAME":"John", "AGE":45, "ADD_ID":1213, "POSTAL_AREA":1, "TIME_ZONE_ID":1,
> "INTEREST":[{"INTEREST_NO":2, "INFO":"x"}, {"INTEREST_NO":3, "INFO":"y"}]}
>
> I'm using spark sql api to manipulate the json data in pyspark shell,
>
> *sqlContext = SQLContext(sc)
> A400= sqlContext.jsonFile('jason_file_path')*
> /Row(ADD_ID=1212, AGE=35, INTEREST=[Row(INFO=u'x', INTEREST_NO=1),
> Row(INFO=u'y', INTEREST_NO=2)], NAME=u'George', POSTAL_AREA=1,
> TIME_ZONE_ID=1)
> Row(ADD_ID=1213, AGE=45, INTEREST=[Row(INFO=u'x', INTEREST_NO=2),
> Row(INFO=u'y', INTEREST_NO=3)], NAME=u'John', POSTAL_AREA=1,
> TIME_ZONE_ID=1)/
> *X = A400.flatMap(lambda i: i.INTEREST)*
> The flatMap results like below, each element in json array were flatten to
> tuple, not my expected  pyspark.sql.Row. I can only access the flatten
> results by index. but it supposed to be flatten to Row(namedTuple) and
> support to access by name.
> (u'x', 1)
> (u'y', 2)
> (u'x', 2)
> (u'y', 3)
>
> My spark version is 1.1.
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Help, pyspark.sql.List flatMap results become tuple

Posted by guoxu1231 <gu...@gmail.com>.
Thanks Davies, it works in 1.2. 



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9975.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org