You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashok Kumar <as...@yahoo.com.INVALID> on 2017/06/25 07:13:15 UTC
RDD and DataFrame persistent memory usage
Gurus,
I understand when we create RDD in Spark it is immutable.
So I have few points please:
- When RDD is created that is just a pointer. Not most Spark operations it is lazy not consumed until a collection operation done that affects RDD?
- When a DF is created from RDD does that result in additional memory to DF. Again with collection operation that affects both RDD and DF built from that RDD?
- There is some references that as you build operations and creating new DFs, one is consuming more and more memory without releasing it back?
- What will happen if I do df.unpersist. I know that it shifts DF from memory (cache) to disk. Will that reduce memory overhead?
- Is it a good idea to unpersist to reduce memory overhead?
Thanking you