You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Mingyu Kim <mk...@palantir.com> on 2014/01/29 10:18:20 UTC

Row order of RDDs

Here¹s my understanding of row order guarantees by RDD in the context of
limit() and collect(). Can someone confirm this?
* sparkContext.parallelize(myList) returns an RDD that may have a different
row order than myList.
* Every RDD loaded with the same file in HDFS (e.g.
sparkContext.textFile(³hdfs://path_to_file²)) will collect rows in the same
order.
* Row order of an RDD is preserved through non-shuffling operations (e.g.
Map, filter).
Mingyu