You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by David <da...@gmail.com> on 2021/01/28 20:02:47 UTC

TezIDCache

Hello,

In the class TezID there is a caching mechanism I can't figure out.  What
us the purpose of caching these objects? This is much like a set since the
key and value are the same. Is there some requirement that the items in the
cache have to be globally unique? Is this some sort of memory saving
optimization to only maintain a single instance of each value?

Thanks.

Re: TezIDCache

Posted by Jonathan Eagles <je...@gmail.com>.
The TezIDCache is memory-saving cache, similar in function to java
String.intern but for objects. Tez states uses an event-based multithreaded
message passing system where hundreds of thousands of messages may be in
flight concurrently. A cache allows great reduction of message size and
therefore runtime memory requirements. However, Tez was also designed to
allow millions of tasks per DAG and tens of thousands of DAGs per session
(perhaps more). So to protect against memory bloat, the cache is
evaporative and uses soft references that the garbage collector can clear
when not in use any long or under memory pressure.

So it has extra complication to balance against the design for two demands.

On Thu, Jan 28, 2021 at 2:03 PM David <da...@gmail.com> wrote:

> Hello,
>
> In the class TezID there is a caching mechanism I can't figure out.  What
> us the purpose of caching these objects? This is much like a set since the
> key and value are the same. Is there some requirement that the items in the
> cache have to be globally unique? Is this some sort of memory saving
> optimization to only maintain a single instance of each value?
>
> Thanks.
>