You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2015/05/09 19:48:59 UTC

[jira] [Created] (SPARK-7505) Update PySpark DataFrame docs: encourage __getitem__, mark as experimental, etc.

Nicholas Chammas created SPARK-7505:
---------------------------------------

             Summary: Update PySpark DataFrame docs: encourage __getitem__, mark as experimental, etc.
                 Key: SPARK-7505
                 URL: https://issues.apache.org/jira/browse/SPARK-7505
             Project: Spark
          Issue Type: Improvement
          Components: Documentation
    Affects Versions: 1.3.1
            Reporter: Nicholas Chammas
            Priority: Minor


The PySpark docs for DataFrame need the following fixes and improvements:

# Per [SPARK-7035], we should encourage the use of {{\_\_getitem\_\_}} over {{\_\_getattr\_\_}} and change all our examples accordingly.
#  We should say clearly that the API is experimental. (That is currently not the case for the PySpark docs.)
# We should provide an example of how to join and select from 2 DataFrames that have identically named columns, because it is not obvious:
  {code}
>>> df1 = sqlContext.jsonRDD(sc.parallelize(['{"a": 4, "other": "I know"}']))
>>> df2 = sqlContext.jsonRDD(sc.parallelize(['{"a": 4, "other": "I dunno"}']))
>>> df12 = df1.join(df2, df1['a'] == df2['a'])
>>> df12.select(df1['a'], df2['other']).show()
a other                                                                               
4 I dunno  {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org