You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jim Carroll <ji...@gmail.com> on 2014/12/17 00:43:26 UTC

How do I stop the automatic partitioning of my RDD?

I've been trying to figure out how to use Spark to do a simple aggregation
without reparitioning and essentially creating fully instantiated
intermediate RDDs and it seem virtually impossible.

I've now gone as far as writing my own single parition RDD that wraps an
Iterator[String] and calling aggregate() on it. Before any of my aggregation
code executes the entire Iterator is unwound and multiple partitions are
created to be given to my aggregation.

The Task execution call stack includes:
   ShuffleMap.runTask
   SortShuffleWriter.write
   ExternalSorter.insertAll
  ... which is iterating over my entire RDD and repartitioning it an
SpillFile collecting it. 

How do I prevent this from happening? There's no need to do this.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-I-stop-the-automatic-partitioning-of-my-RDD-tp20732.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How do I stop the automatic partitioning of my RDD?

Posted by Jim Carroll <ji...@gmail.com>.
Wow. i just realized what was happening and it's all my fault. I have a
library method that I wrote that presents the RDD and I was actually
repartitioning it myself.

I feel pretty dumb. Sorry about that.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-do-I-stop-the-automatic-partitioning-of-my-RDD-tp20732p20735.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org