You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "Stephen O'Donnell (Jira)" <ji...@apache.org> on 2022/02/28 17:13:00 UTC

[jira] [Resolved] (HDDS-6307) Improve processing and memory efficiency of container reports

     [ https://issues.apache.org/jira/browse/HDDS-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen O'Donnell resolved HDDS-6307.
-------------------------------------
    Fix Version/s: 1.3.0
       Resolution: Fixed

> Improve processing and memory efficiency of container reports
> -------------------------------------------------------------
>
>                 Key: HDDS-6307
>                 URL: https://issues.apache.org/jira/browse/HDDS-6307
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.3.0
>
>
> The container report handing code has some issues which this Jira intends to address.
> When handling full container reports, we make several copies of large sets to identify containers in SCM, but not reported by the DN (they have somehow got lost on the DNs).
> Looking at the current code:
> {code}
>       synchronized (datanodeDetails) {
>         final List<ContainerReplicaProto> replicas =
>             containerReport.getReportsList();
>         final Set<ContainerID> containersInSCM =
>             nodeManager.getContainers(datanodeDetails);
>         final Set<ContainerID> containersInDn = replicas.parallelStream()
>             .map(ContainerReplicaProto::getContainerID)
>             .map(ContainerID::valueOf).collect(Collectors.toSet());
>         final Set<ContainerID> missingReplicas = new HashSet<>(containersInSCM);
>         missingReplicas.removeAll(containersInDn);
>         processContainerReplicas(datanodeDetails, replicas, publisher);
>         processMissingReplicas(datanodeDetails, missingReplicas);
>         /*
>          * Update the latest set of containers for this datanode in
>          * NodeManager
>          */
>         nodeManager.setContainers(datanodeDetails, containersInDn);
>         containerManager.notifyContainerReportProcessing(true, true);
>       }
> {code}
> The Set "containersInSCM" comes from NodeStateMap:
> {code}
>   public Set<ContainerID> getContainers(UUID uuid)
>       throws NodeNotFoundException {
>     lock.readLock().lock();
>     try {
>       checkIfNodeExist(uuid);
>       return Collections
>           .unmodifiableSet(new HashSet<>(nodeToContainer.get(uuid)));
>     } finally {
>       lock.readLock().unlock();
>     }
>   }
> {code}
> This returns a new copy of the set, so there is no need to wrap it UnModifiable. This means we can avoid copying it again in the report Handler. The current code ends up with 3 copies of this potentially large set.
> Next we take the FCR and stream it into a set of ContainerID. This is used for two purposes - 1, to subract from "containersInSCM" to yield the missing containers. 2, to replace the list already in nodeMnager.
> We can avoid this second large set, by simply adding each ContainerID to nodeManager if it is not already there and then remove any "missing" from nodeManager at the end. This also avoids replacing the entire set of ContainersIDs and all the tenured objects associated with it with new effectively identical object instances.
> Finally we can process the replicas at the same time, and re-use the ContainerID object, which we currently form 2 times. We store one copy in the nodeManager, and then another distinct copy in each ContainerReplica.
> I checked how many ContainerID objects are present in SCM, by loading 10 closed containers with 3 replicas, and capturing a heap histogram with jmap:
> {code}
> bash-4.2$ jmap -histo:live 8 | grep ContainerID
>  262:            81           1944  org.apache.hadoop.hdds.scm.container.ContainerID
>  301:            30           1440  org.apache.hadoop.hdds.scm.container.ContainerReplica
>  419:            10            800  org.apache.hadoop.hdds.scm.container.ContainerInfo
> {code}
> There are 10 Containers as expected, 30 replicas, but 81 ContainerID objects. Ideally there should be 10, to match the number of Containers, but some of these may be created from other code paths, such as pipeline creation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org