You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Augusto <au...@cactusglobal.com> on 2021/02/25 12:57:49 UTC

Aggregating large objects and reducing memory pressure

Hi

I am writing here because I need help/advice on how to perform aggregations
more efficiently.

In my current setup I have a Accumulator object which is used as zeroValue
for the foldByKey function. This Accumulator object can get very large since
the accumulations also include lists and maps. This in turn demands that we
use larger and larger memory machines or risk breakdown of jobs due to out
of memory errors.

I have been thinking how to tackle this problem and try to come up with
solutions but I would like to hear what are the best practices or how others
have deal with this kind of problem. 

Best regards, 
Augusto



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org