You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joe L <se...@yahoo.com> on 2014/04/16 09:29:41 UTC

what is a partition? how it works?

I want to know as follows:

what is a partition? how it works?
how it is different from hadoop partition?

For example:
>>> sc.parallelize([1,2,3,4]).map(lambda x:
>>> (x,x)).partitionBy(2).glom().collect()
[[(2,2), (4,4)], [(1,1), (3,3)]]

from this, we will get 2 partitions but what does it mean? how do they
reside in memory in the cluster?

I am sorry for such a simple question but I couldn't find any specific
information about what happens underneath partitioning. 

Thank you, Joe



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/what-is-a-partition-how-it-works-tp4325.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.