You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joe L <se...@yahoo.com> on 2014/04/16 06:58:37 UTC

groupByKey(None) returns partitions according to the keys?

I was wonder if groupByKey returns 2 partitions in the below example?

>>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
>>> sorted(x.groupByKey().collect())
[('a', [1, 1]), ('b', [1])]



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-None-returns-partitions-according-to-the-keys-tp4318.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: groupByKey(None) returns partitions according to the keys?

Posted by wxhsdp <wx...@gmail.com>.
No, partition number is determined by the parameter you set in groupByKey,
see
http://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
for details, suggest you reading some docs before ask questions


Joe L wrote
> I was wonder if groupByKey returns 2 partitions in the below example?
> 
>>>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
>>>> sorted(x.groupByKey().collect())
> [('a', [1, 1]), ('b', [1])]





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-None-returns-partitions-according-to-the-keys-tp4318p4377.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.