You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2015/02/18 10:00:57 UTC
spark git commit: [SPARK-5878] fix DataFrame.repartition() in Python

Repository: spark
Updated Branches:
  refs/heads/master de0dd6de2 -> c1b6fa983


[SPARK-5878] fix DataFrame.repartition() in Python

Also add tests for distinct()

Author: Davies Liu <da...@databricks.com>

Closes #4667 from davies/repartition and squashes the following commits:

79059fd [Davies Liu] add test
cb4915e [Davies Liu] fix repartition


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1b6fa98
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1b6fa98
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1b6fa98

Branch: refs/heads/master
Commit: c1b6fa9838f9d26d60fab3b05a96649882e3dd5b
Parents: de0dd6d
Author: Davies Liu <da...@databricks.com>
Authored: Wed Feb 18 01:00:54 2015 -0800
Committer: Reynold Xin <rx...@databricks.com>
Committed: Wed Feb 18 01:00:54 2015 -0800

----------------------------------------------------------------------
 python/pyspark/sql/dataframe.py | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c1b6fa98/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 388033d..52bd75b 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -434,12 +434,18 @@ class DataFrame(object):
     def repartition(self, numPartitions):
         """ Return a new :class:`DataFrame` that has exactly `numPartitions`
         partitions.
+
+        >>> df.repartition(10).rdd.getNumPartitions()
+        10
         """
-        return DataFrame(self._jdf.repartition(numPartitions, None), self.sql_ctx)
+        return DataFrame(self._jdf.repartition(numPartitions), self.sql_ctx)
 
     def distinct(self):
         """
         Return a new :class:`DataFrame` containing the distinct rows in this DataFrame.
+
+        >>> df.distinct().count()
+        2L
         """
         return DataFrame(self._jdf.distinct(), self.sql_ctx)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org