You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Turner Eagles (Jira)" <ji...@apache.org> on 2020/02/21 15:24:00 UTC

[jira] [Commented] (TEZ-4090) TezIDCache::getInstance sync lock

    [ https://issues.apache.org/jira/browse/TEZ-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041949#comment-17041949 ] 

Jonathan Turner Eagles commented on TEZ-4090:
---------------------------------------------

This is a matter of policy and is not strictly better to use the synchronizedMap.
If the policy is to improve performance, we could implement a synchronizedMap and reduce the lock held time if we assume mostly cache hits (both present and not null). The we return values without the lock held, reducing contention. In the case of a cache miss, there is need for a double lock (cache.get() and cache.put()). If the policy is to reduce memory footprint, synchronizedMap increases the chance that multiple copies of the value are persisted in memory.

My other assumption about this code is that most access are happening serially in the event thread (single threaded by default) and that the most important factor is overall speed of the function and not contention. I need to re-familiarize myself with these assumptions to make a good policy judgement.

{code}
    synchronized T getInstance(final T id) {
      final WeakReference<T> cached = cache.get(id);
      if (cached != null) {
        final T value = cached.get();
        if (value != null)
          return value;
      }
      cache.put(id, new WeakReference<T>(id));
      return id;
    }
{code}

> TezIDCache::getInstance sync lock
> ---------------------------------
>
>                 Key: TEZ-4090
>                 URL: https://issues.apache.org/jira/browse/TEZ-4090
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: László Bodor
>            Priority: Minor
>         Attachments: tez-task-id-concurrency.png
>
>
> profiler output shared by [~gopalv]. 
> Looking at the code, it is not mandatory to have sync on getInstance. 
> https://github.com/apache/tez/blob/master/tez-common/src/main/java/org/apache/tez/dag/records/TezID.java#L48



--
This message was sent by Atlassian Jira
(v8.3.4#803005)