You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2015/02/18 10:00:57 UTC
spark git commit: [SPARK-5878] fix DataFrame.repartition() in Python
Repository: spark
Updated Branches:
refs/heads/master de0dd6de2 -> c1b6fa983
[SPARK-5878] fix DataFrame.repartition() in Python
Also add tests for distinct()
Author: Davies Liu <da...@databricks.com>
Closes #4667 from davies/repartition and squashes the following commits:
79059fd [Davies Liu] add test
cb4915e [Davies Liu] fix repartition
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1b6fa98
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1b6fa98
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1b6fa98
Branch: refs/heads/master
Commit: c1b6fa9838f9d26d60fab3b05a96649882e3dd5b
Parents: de0dd6d
Author: Davies Liu <da...@databricks.com>
Authored: Wed Feb 18 01:00:54 2015 -0800
Committer: Reynold Xin <rx...@databricks.com>
Committed: Wed Feb 18 01:00:54 2015 -0800
----------------------------------------------------------------------
python/pyspark/sql/dataframe.py | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/c1b6fa98/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 388033d..52bd75b 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -434,12 +434,18 @@ class DataFrame(object):
def repartition(self, numPartitions):
""" Return a new :class:`DataFrame` that has exactly `numPartitions`
partitions.
+
+ >>> df.repartition(10).rdd.getNumPartitions()
+ 10
"""
- return DataFrame(self._jdf.repartition(numPartitions, None), self.sql_ctx)
+ return DataFrame(self._jdf.repartition(numPartitions), self.sql_ctx)
def distinct(self):
"""
Return a new :class:`DataFrame` containing the distinct rows in this DataFrame.
+
+ >>> df.distinct().count()
+ 2L
"""
return DataFrame(self._jdf.distinct(), self.sql_ctx)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org