You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Vishnu Viswanath <vi...@gmail.com> on 2016/02/04 14:58:15 UTC

Question on RDD caching

Hello,

When we call cache() or persist(MEMORY_ONLY), how does the request flow to
the nodes?
I am assuming this will happen:

1.  Driver knows which all nodes hold the partition for the given
rdd (where is this info stored?)
2. It sends a cache request to the node's executor
3. The executor will store the Partition in memory
4. Therefore, each node can have partitions of different RDDs in it's cache.

Can someone please tell me if I am correct.

Thanks and Regards,
Vishnu Viswanath,