You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Semet (JIRA)" <ji...@apache.org> on 2016/09/01 11:51:20 UTC
[jira] [Created] (SPARK-17360) PySpark can create dataframe from a
Python generator
Semet created SPARK-17360:
-----------------------------
Summary: PySpark can create dataframe from a Python generator
Key: SPARK-17360
URL: https://issues.apache.org/jira/browse/SPARK-17360
Project: Spark
Issue Type: Improvement
Reporter: Semet
Priority: Trivial
It looks like one can create a dataframe from a Python generator, which might be more efficient that by creating the list of row and use createDataframe:
{code}
>>> # On Python 3, you want to use "range" on the following line
>>> d = ({'name': 'Alice-{}'.format(i), 'age': i} for i in xrange(0, 10000000))
>>> d # Please note that 'd' is a generator and not a structure with the 10000000 elements.
<generator object <genexpr> at 0x7f1234b92af0>
>>> sqlContext.createDataFrame(d).take(5)
[Row(age=1, name=u'Alice-1')]
[Row(age=2, name=u'Alice-2')]
[Row(age=3, name=u'Alice-3')]
[Row(age=4, name=u'Alice-4')]
[Row(age=5, name=u'Alice-5')]
{code}
Looking at the code, there is nothing important to change in the code, only doc and unit tests
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org