You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by prayag chandran <pr...@gmail.com> on 2016/09/14 01:25:40 UTC

Shuffle Spill (Memory) greater than Shuffle Spill (Disk)

Hello!

In my spark job, I see that Shuffle Spill (Memory) is greater than Shuffle
Spill (Disk). spark.shuffle.compress parameter is left to default(true?). I
would expect the size on disk to be smaller which isn't the case here. I've
been having some performance issues as well and I suspect this is somehow
related to that.

All memory configuration parameters are default. I'm running spark 2.0.
Shuffle Spill (Memory): 712.0 MB
Shuffle Spill (Disk): 7.9 GB

To my surprise, I also see the following for some tasks:
Shuffle Spill (Memory): 0.0 B
Shuffle Spill (Disk): 77.5 MB

I would appreciate if anyone can explain this behavior.

-Prayag