You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Prithish <pr...@gmail.com> on 2016/10/27 05:19:24 UTC
Question about In-Memory size (cache / cacheTable)
Hello,
I am trying to understand how in-memory size is changing in these
situations. Specifically, why is in-memory size much higher for avro and
parquet? Are there any optimizations necessary to reduce this?
Used cacheTable on each of these:
AVRO File (600kb) - In-memory size was 12mb
Parquet File (600kb) - In-memory size was 12mb
CSV File (3mb, was the same file as above) - In-memory size was 600Kb
Because of this, we need a cluster with a much bigger memory if we were to
cache the avro files.
Thanks for your help.
Prit