You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/07/09 18:55:49 UTC

[GitHub] [ozone] avijayanhwx commented on a change in pull request #2389: HDDS-5386. Add a NSSummaryTask to write NSSummary info into RDB

avijayanhwx commented on a change in pull request #2389:
URL: https://github.com/apache/ozone/pull/2389#discussion_r667140102



##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTask.java
##########
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.utils.db.Table;
+import org.apache.hadoop.hdds.utils.db.TableIterator;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.WithParentObjectId;
+import org.apache.hadoop.ozone.recon.ReconConstants;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.api.types.NSSummary;
+import org.apache.hadoop.ozone.recon.spi.ReconNamespaceSummaryManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.DIRECTORY_TABLE;
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.FILE_TABLE;
+
+/**
+ * Task to query data from OMDB and write into Recon RocksDB.

Review comment:
       Maybe a more specific Javadoc?

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/types/NSSummary.java
##########
@@ -20,51 +20,67 @@
 
 import org.apache.hadoop.ozone.recon.ReconConstants;
 
+import java.util.ArrayList;
 import java.util.Arrays;
+import java.util.List;
 
 /**
  * Class to encapsulate namespace metadata summaries from OM.
  */
 
 public class NSSummary {
   private int numOfFiles;
-  private int sizeOfFiles;
+  private long sizeOfFiles;
   private int[] fileSizeBucket;
+  private List<Long> childDir;
 
   public NSSummary() {
     this.numOfFiles = 0;
-    this.sizeOfFiles = 0;
+    this.sizeOfFiles = 0L;
     this.fileSizeBucket = new int[ReconConstants.NUM_OF_BINS];
+    this.childDir = new ArrayList<>();
   }
 
-  public NSSummary(int numOfFiles, int sizeOfFiles, int[] bucket) {
+  public NSSummary(int numOfFiles,
+                   long sizeOfFiles,
+                   int[] bucket,
+                   List<Long> childDir) {
     this.numOfFiles = numOfFiles;
     this.sizeOfFiles = sizeOfFiles;
     setFileSizeBucket(bucket);
+    setChildDir(childDir);
   }
 
   public int getNumOfFiles() {
     return numOfFiles;
   }
 
-  public int getSizeOfFiles() {
+  public long getSizeOfFiles() {
     return sizeOfFiles;
   }
 
   public int[] getFileSizeBucket() {
     return Arrays.copyOf(this.fileSizeBucket, ReconConstants.NUM_OF_BINS);
   }
 
+  public List<Long> getChildDir() {
+    return new ArrayList<>(childDir);

Review comment:
       Why do we need to create a copy of the map here? We can avoid the setter in NSSummary#setChildDir if a copy is not being created here. Is there a use case for an NSSummary object to be immutable? If yes, then there should not be a set.

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTask.java
##########
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.utils.db.Table;
+import org.apache.hadoop.hdds.utils.db.TableIterator;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.WithParentObjectId;
+import org.apache.hadoop.ozone.recon.ReconConstants;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.api.types.NSSummary;
+import org.apache.hadoop.ozone.recon.spi.ReconNamespaceSummaryManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.DIRECTORY_TABLE;
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.FILE_TABLE;
+
+/**
+ * Task to query data from OMDB and write into Recon RocksDB.
+ */
+public class NSSummaryTask implements ReconOmTask {
+  private static final Logger LOG =
+          LoggerFactory.getLogger(NSSummaryTask.class);
+  private ReconNamespaceSummaryManager reconNamespaceSummaryManager;
+
+  @Inject
+  public NSSummaryTask(ReconNamespaceSummaryManager
+                                 reconNamespaceSummaryManager) {
+    this.reconNamespaceSummaryManager = reconNamespaceSummaryManager;
+  }
+
+  @Override
+  public String getTaskName() {
+    return "NSSummaryTask";
+  }
+
+  // We only listen to updates from FSO-enabled KeyTable(FileTable) and DirTable
+  public Collection<String> getTaskTables() {
+    return Arrays.asList(new String[]{FILE_TABLE, DIRECTORY_TABLE});
+  }
+
+  @Override
+  public Pair<String, Boolean> process(OMUpdateEventBatch events) {
+    Iterator<OMDBUpdateEvent> eventIterator = events.getIterator();
+    final Collection<String> taskTables = getTaskTables();
+
+    while (eventIterator.hasNext()) {
+      OMDBUpdateEvent<String, ? extends
+              WithParentObjectId> omdbUpdateEvent = eventIterator.next();
+      OMDBUpdateEvent.OMDBUpdateAction action = omdbUpdateEvent.getAction();
+
+      // we only process updates on OM's KeyTable and Dirtable
+      String table = omdbUpdateEvent.getTable();
+      boolean updateOnFileTable = table.equals(FILE_TABLE);
+      if (!taskTables.contains(table)) {
+        continue;
+      }
+
+      String updatedKey = omdbUpdateEvent.getKey();
+
+      try {
+        if (updateOnFileTable) {
+          // key update on fileTable
+          OMDBUpdateEvent<String, OmKeyInfo> keyTableUpdateEvent =
+                  (OMDBUpdateEvent<String, OmKeyInfo>) omdbUpdateEvent;
+          OmKeyInfo updatedKeyInfo = keyTableUpdateEvent.getValue();
+          OmKeyInfo oldKeyInfo = keyTableUpdateEvent.getOldValue();
+
+          switch (action) {
+          case PUT:
+            writeOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          case DELETE:
+            deleteOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          case UPDATE:
+            if (oldKeyInfo != null) {
+              // delete first, then put
+              deleteOmKeyInfoOnNamespaceDB(oldKeyInfo);
+            } else {
+              LOG.warn("Update event does not have the old keyInfo for {}.",
+                      updatedKey);
+            }
+            writeOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          default:
+            LOG.debug("Skipping DB update event : {}",
+                    omdbUpdateEvent.getAction());
+          }
+        } else {
+          // directory update on DirTable
+          OMDBUpdateEvent<String, OmDirectoryInfo> dirTableUpdateEvent =
+                  (OMDBUpdateEvent<String, OmDirectoryInfo>) omdbUpdateEvent;
+          OmDirectoryInfo updatedDirectoryInfo = dirTableUpdateEvent.getValue();
+          OmDirectoryInfo oldDirectoryInfo = dirTableUpdateEvent.getOldValue();
+
+          switch (action) {
+          case PUT:
+            writeOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          case DELETE:
+            deleteOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          case UPDATE:
+            // TODO: we may just want to ignore update event on table,
+            //  if objectId and parentObjectId cannot be modified.
+            if (oldDirectoryInfo != null) {
+              // delete first, then put
+              deleteOmDirectoryInfoOnNamespaceDB(oldDirectoryInfo);
+            } else {
+              LOG.warn("Update event does not have the old dirInfo for {}.",
+                      updatedKey);
+            }
+            writeOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          default:
+            LOG.debug("Skipping DB update event : {}",
+                    omdbUpdateEvent.getAction());
+          }
+        }
+      } catch (IOException ioEx) {
+        LOG.error("Unable to process Namespace Summary data in Recon DB. ",
+                ioEx);
+        return new ImmutablePair<>(getTaskName(), false);
+      }
+    }
+    LOG.info("Completed a process run of NSSummaryTask");
+    return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  public Pair<String, Boolean> reprocess(OMMetadataManager omMetadataManager) {
+    // actually fileTable with FSO
+    Table keyTable = omMetadataManager.getKeyTable();
+
+    TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
+            keyTableIter = keyTable.iterator();
+
+    try {
+      // reinit Recon RocksDB's namespace CF.
+      reconNamespaceSummaryManager.initNSSummaryTable();

Review comment:
       If we are truncating the table here, can we name method appropriately? init implies we are creating something new.

##########
File path: hadoop-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/tasks/TestNSSummaryTask.java
##########
@@ -0,0 +1,475 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+
+import org.apache.hadoop.hdds.client.StandaloneReplicationConfig;
+import org.apache.hadoop.hdds.protocol.proto.HddsProtos;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.ratis.utils.OzoneManagerRatisUtils;
+import org.apache.hadoop.ozone.recon.ReconConstants;
+import org.apache.hadoop.ozone.recon.ReconTestInjector;
+import org.apache.hadoop.ozone.recon.api.types.NSSummary;
+import org.apache.hadoop.ozone.recon.recovery.ReconOMMetadataManager;
+import org.apache.hadoop.ozone.recon.spi.ReconNamespaceSummaryManager;
+import org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+import static org.apache.hadoop.ozone.OzoneConsts.OM_KEY_PREFIX;
+import static org.apache.hadoop.ozone.recon.OMMetadataManagerTestUtils.getMockOzoneManagerServiceProviderWithFSO;
+import static org.apache.hadoop.ozone.recon.OMMetadataManagerTestUtils.getTestReconOmMetadataManager;
+import static org.apache.hadoop.ozone.recon.OMMetadataManagerTestUtils.initializeNewOmMetadataManager;
+import static org.apache.hadoop.ozone.recon.OMMetadataManagerTestUtils.writeDirToOm;
+import static org.apache.hadoop.ozone.recon.OMMetadataManagerTestUtils.writeKeyToOm;
+
+/**
+ * Test for NSSummaryTask.
+ */
+public class TestNSSummaryTask {
+  @Rule
+  public TemporaryFolder temporaryFolder = new TemporaryFolder();
+
+  private ReconNamespaceSummaryManager reconNamespaceSummaryManager;
+  private OMMetadataManager omMetadataManager;
+  private ReconOMMetadataManager reconOMMetadataManager;
+  private OzoneManagerServiceProviderImpl ozoneManagerServiceProvider;
+
+  // Object names in FSO-enabled format
+  private static final String VOL = "vol";
+  private static final String BUCKET_ONE = "bucket1";
+  private static final String BUCKET_TWO = "bucket2";
+  private static final String KEY_ONE = "file1";
+  private static final String KEY_TWO = "file2";
+  private static final String KEY_THREE = "dir1/dir2/file3";
+  private static final String KEY_FOUR = "file4";
+  private static final String KEY_FIVE = "file5";
+  private static final String FILE_ONE = "file1";
+  private static final String FILE_TWO = "file2";
+  private static final String FILE_THREE = "file3";
+  private static final String FILE_FOUR = "file4";
+  private static final String FILE_FIVE = "file5";
+  private static final String DIR_ONE = "dir1";
+  private static final String DIR_TWO = "dir2";
+  private static final String DIR_THREE = "dir3";
+  private static final String DIR_FOUR = "dir4";
+  private static final String DIR_FIVE = "dir5";
+
+  private static final long BUCKET_ONE_OBJECT_ID = 1L;
+  private static final long BUCKET_TWO_OBJECT_ID = 2L;
+  private static final long KEY_ONE_OBJECT_ID = 3L;
+  private static final long DIR_ONE_OBJECT_ID = 4L;
+  private static final long KEY_TWO_OBJECT_ID = 5L;
+  private static final long KEY_FOUR_OBJECT_ID = 6L;
+  private static final long DIR_TWO_OBJECT_ID = 7L;
+  private static final long KEY_THREE_OBJECT_ID = 8L;
+  private static final long KEY_FIVE_OBJECT_ID = 9L;
+  private static final long DIR_THREE_OBJECT_ID = 10L;
+  private static final long DIR_FOUR_OBJECT_ID = 11L;
+  private static final long DIR_FIVE_OBJECT_ID = 12L;
+
+  private static final long KEY_ONE_SIZE = 500L;
+  private static final long KEY_TWO_OLD_SIZE = 1025L;
+  private static final long KEY_TWO_UPDATE_SIZE = 1023L;
+  private static final long KEY_THREE_SIZE =
+          ReconConstants.MAX_FILE_SIZE_UPPER_BOUND - 100L;
+  private static final long KEY_FOUR_SIZE = 2050L;
+  private static final long KEY_FIVE_SIZE = 100L;
+
+  @Before
+  public void setUp() throws Exception {
+    omMetadataManager = initializeNewOmMetadataManager(
+            temporaryFolder.newFolder());
+    ozoneManagerServiceProvider =
+            getMockOzoneManagerServiceProviderWithFSO();
+    reconOMMetadataManager = getTestReconOmMetadataManager(omMetadataManager,
+            temporaryFolder.newFolder());
+
+    ReconTestInjector reconTestInjector =
+            new ReconTestInjector.Builder(temporaryFolder)
+                    .withReconOm(reconOMMetadataManager)
+                    .withOmServiceProvider(ozoneManagerServiceProvider)
+                    .withReconSqlDb()
+                    .withContainerDB()
+                    .build();
+    reconNamespaceSummaryManager =
+            reconTestInjector.getInstance(ReconNamespaceSummaryManager.class);
+    OzoneManagerRatisUtils.setBucketFSOptimized(true);
+  }
+
+  @Test
+  public void testReprocess() throws Exception {
+    NSSummary nonExistentSummary =
+            reconNamespaceSummaryManager.getNSSummary(BUCKET_ONE_OBJECT_ID);
+    Assert.assertNull(nonExistentSummary);
+
+    populateOMDB();
+
+    NSSummaryTask nsSummaryTask = new NSSummaryTask(

Review comment:
       Can we write an NS summary prior to reprocess and verify it got cleaned up after?

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTask.java
##########
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.utils.db.Table;
+import org.apache.hadoop.hdds.utils.db.TableIterator;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.WithParentObjectId;
+import org.apache.hadoop.ozone.recon.ReconConstants;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.api.types.NSSummary;
+import org.apache.hadoop.ozone.recon.spi.ReconNamespaceSummaryManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.DIRECTORY_TABLE;
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.FILE_TABLE;
+
+/**
+ * Task to query data from OMDB and write into Recon RocksDB.
+ */
+public class NSSummaryTask implements ReconOmTask {
+  private static final Logger LOG =
+          LoggerFactory.getLogger(NSSummaryTask.class);
+  private ReconNamespaceSummaryManager reconNamespaceSummaryManager;
+
+  @Inject
+  public NSSummaryTask(ReconNamespaceSummaryManager
+                                 reconNamespaceSummaryManager) {
+    this.reconNamespaceSummaryManager = reconNamespaceSummaryManager;
+  }
+
+  @Override
+  public String getTaskName() {
+    return "NSSummaryTask";
+  }
+
+  // We only listen to updates from FSO-enabled KeyTable(FileTable) and DirTable
+  public Collection<String> getTaskTables() {
+    return Arrays.asList(new String[]{FILE_TABLE, DIRECTORY_TABLE});
+  }
+
+  @Override
+  public Pair<String, Boolean> process(OMUpdateEventBatch events) {
+    Iterator<OMDBUpdateEvent> eventIterator = events.getIterator();
+    final Collection<String> taskTables = getTaskTables();
+
+    while (eventIterator.hasNext()) {
+      OMDBUpdateEvent<String, ? extends
+              WithParentObjectId> omdbUpdateEvent = eventIterator.next();
+      OMDBUpdateEvent.OMDBUpdateAction action = omdbUpdateEvent.getAction();
+
+      // we only process updates on OM's KeyTable and Dirtable
+      String table = omdbUpdateEvent.getTable();
+      boolean updateOnFileTable = table.equals(FILE_TABLE);
+      if (!taskTables.contains(table)) {
+        continue;
+      }
+
+      String updatedKey = omdbUpdateEvent.getKey();
+
+      try {
+        if (updateOnFileTable) {
+          // key update on fileTable
+          OMDBUpdateEvent<String, OmKeyInfo> keyTableUpdateEvent =
+                  (OMDBUpdateEvent<String, OmKeyInfo>) omdbUpdateEvent;
+          OmKeyInfo updatedKeyInfo = keyTableUpdateEvent.getValue();
+          OmKeyInfo oldKeyInfo = keyTableUpdateEvent.getOldValue();
+
+          switch (action) {
+          case PUT:
+            writeOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          case DELETE:
+            deleteOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          case UPDATE:
+            if (oldKeyInfo != null) {
+              // delete first, then put
+              deleteOmKeyInfoOnNamespaceDB(oldKeyInfo);
+            } else {
+              LOG.warn("Update event does not have the old keyInfo for {}.",
+                      updatedKey);
+            }
+            writeOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          default:
+            LOG.debug("Skipping DB update event : {}",
+                    omdbUpdateEvent.getAction());
+          }
+        } else {
+          // directory update on DirTable
+          OMDBUpdateEvent<String, OmDirectoryInfo> dirTableUpdateEvent =
+                  (OMDBUpdateEvent<String, OmDirectoryInfo>) omdbUpdateEvent;
+          OmDirectoryInfo updatedDirectoryInfo = dirTableUpdateEvent.getValue();
+          OmDirectoryInfo oldDirectoryInfo = dirTableUpdateEvent.getOldValue();
+
+          switch (action) {
+          case PUT:
+            writeOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          case DELETE:
+            deleteOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          case UPDATE:
+            // TODO: we may just want to ignore update event on table,
+            //  if objectId and parentObjectId cannot be modified.
+            if (oldDirectoryInfo != null) {
+              // delete first, then put
+              deleteOmDirectoryInfoOnNamespaceDB(oldDirectoryInfo);
+            } else {
+              LOG.warn("Update event does not have the old dirInfo for {}.",
+                      updatedKey);
+            }
+            writeOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          default:
+            LOG.debug("Skipping DB update event : {}",
+                    omdbUpdateEvent.getAction());
+          }
+        }
+      } catch (IOException ioEx) {
+        LOG.error("Unable to process Namespace Summary data in Recon DB. ",
+                ioEx);
+        return new ImmutablePair<>(getTaskName(), false);
+      }
+    }
+    LOG.info("Completed a process run of NSSummaryTask");
+    return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  public Pair<String, Boolean> reprocess(OMMetadataManager omMetadataManager) {
+    // actually fileTable with FSO
+    Table keyTable = omMetadataManager.getKeyTable();
+
+    TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
+            keyTableIter = keyTable.iterator();
+
+    try {
+      // reinit Recon RocksDB's namespace CF.
+      reconNamespaceSummaryManager.initNSSummaryTable();
+
+      while (keyTableIter.hasNext()) {
+        Table.KeyValue<String, OmKeyInfo> kv = keyTableIter.next();
+        OmKeyInfo keyInfo = kv.getValue();
+        writeOmKeyInfoOnNamespaceDB(keyInfo);
+      }
+
+      Table dirTable = omMetadataManager.getDirectoryTable();
+      TableIterator<String, ? extends Table.KeyValue<String, OmDirectoryInfo>>
+              dirTableIter = dirTable.iterator();
+
+      while (dirTableIter.hasNext()) {
+        Table.KeyValue<String, OmDirectoryInfo> kv = dirTableIter.next();
+        OmDirectoryInfo directoryInfo = kv.getValue();
+        writeOmDirectoryInfoOnNamespaceDB(directoryInfo);

Review comment:
       Why not go down the hierarchy tree instead of starting with leaves? (Create directories first and then the files)

##########
File path: hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/NSSummaryTask.java
##########
@@ -0,0 +1,304 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.ozone.recon.tasks;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hdds.utils.db.Table;
+import org.apache.hadoop.hdds.utils.db.TableIterator;
+import org.apache.hadoop.ozone.om.OMMetadataManager;
+import org.apache.hadoop.ozone.om.helpers.OmDirectoryInfo;
+import org.apache.hadoop.ozone.om.helpers.OmKeyInfo;
+import org.apache.hadoop.ozone.om.helpers.WithParentObjectId;
+import org.apache.hadoop.ozone.recon.ReconConstants;
+import org.apache.hadoop.ozone.recon.ReconUtils;
+import org.apache.hadoop.ozone.recon.api.types.NSSummary;
+import org.apache.hadoop.ozone.recon.spi.ReconNamespaceSummaryManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import javax.inject.Inject;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.List;
+
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.DIRECTORY_TABLE;
+import static org.apache.hadoop.ozone.om.OmMetadataManagerImpl.FILE_TABLE;
+
+/**
+ * Task to query data from OMDB and write into Recon RocksDB.
+ */
+public class NSSummaryTask implements ReconOmTask {
+  private static final Logger LOG =
+          LoggerFactory.getLogger(NSSummaryTask.class);
+  private ReconNamespaceSummaryManager reconNamespaceSummaryManager;
+
+  @Inject
+  public NSSummaryTask(ReconNamespaceSummaryManager
+                                 reconNamespaceSummaryManager) {
+    this.reconNamespaceSummaryManager = reconNamespaceSummaryManager;
+  }
+
+  @Override
+  public String getTaskName() {
+    return "NSSummaryTask";
+  }
+
+  // We only listen to updates from FSO-enabled KeyTable(FileTable) and DirTable
+  public Collection<String> getTaskTables() {
+    return Arrays.asList(new String[]{FILE_TABLE, DIRECTORY_TABLE});
+  }
+
+  @Override
+  public Pair<String, Boolean> process(OMUpdateEventBatch events) {
+    Iterator<OMDBUpdateEvent> eventIterator = events.getIterator();
+    final Collection<String> taskTables = getTaskTables();
+
+    while (eventIterator.hasNext()) {
+      OMDBUpdateEvent<String, ? extends
+              WithParentObjectId> omdbUpdateEvent = eventIterator.next();
+      OMDBUpdateEvent.OMDBUpdateAction action = omdbUpdateEvent.getAction();
+
+      // we only process updates on OM's KeyTable and Dirtable
+      String table = omdbUpdateEvent.getTable();
+      boolean updateOnFileTable = table.equals(FILE_TABLE);
+      if (!taskTables.contains(table)) {
+        continue;
+      }
+
+      String updatedKey = omdbUpdateEvent.getKey();
+
+      try {
+        if (updateOnFileTable) {
+          // key update on fileTable
+          OMDBUpdateEvent<String, OmKeyInfo> keyTableUpdateEvent =
+                  (OMDBUpdateEvent<String, OmKeyInfo>) omdbUpdateEvent;
+          OmKeyInfo updatedKeyInfo = keyTableUpdateEvent.getValue();
+          OmKeyInfo oldKeyInfo = keyTableUpdateEvent.getOldValue();
+
+          switch (action) {
+          case PUT:
+            writeOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          case DELETE:
+            deleteOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          case UPDATE:
+            if (oldKeyInfo != null) {
+              // delete first, then put
+              deleteOmKeyInfoOnNamespaceDB(oldKeyInfo);
+            } else {
+              LOG.warn("Update event does not have the old keyInfo for {}.",
+                      updatedKey);
+            }
+            writeOmKeyInfoOnNamespaceDB(updatedKeyInfo);
+            break;
+
+          default:
+            LOG.debug("Skipping DB update event : {}",
+                    omdbUpdateEvent.getAction());
+          }
+        } else {
+          // directory update on DirTable
+          OMDBUpdateEvent<String, OmDirectoryInfo> dirTableUpdateEvent =
+                  (OMDBUpdateEvent<String, OmDirectoryInfo>) omdbUpdateEvent;
+          OmDirectoryInfo updatedDirectoryInfo = dirTableUpdateEvent.getValue();
+          OmDirectoryInfo oldDirectoryInfo = dirTableUpdateEvent.getOldValue();
+
+          switch (action) {
+          case PUT:
+            writeOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          case DELETE:
+            deleteOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          case UPDATE:
+            // TODO: we may just want to ignore update event on table,
+            //  if objectId and parentObjectId cannot be modified.
+            if (oldDirectoryInfo != null) {
+              // delete first, then put
+              deleteOmDirectoryInfoOnNamespaceDB(oldDirectoryInfo);
+            } else {
+              LOG.warn("Update event does not have the old dirInfo for {}.",
+                      updatedKey);
+            }
+            writeOmDirectoryInfoOnNamespaceDB(updatedDirectoryInfo);
+            break;
+
+          default:
+            LOG.debug("Skipping DB update event : {}",
+                    omdbUpdateEvent.getAction());
+          }
+        }
+      } catch (IOException ioEx) {
+        LOG.error("Unable to process Namespace Summary data in Recon DB. ",
+                ioEx);
+        return new ImmutablePair<>(getTaskName(), false);
+      }
+    }
+    LOG.info("Completed a process run of NSSummaryTask");
+    return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  @Override
+  public Pair<String, Boolean> reprocess(OMMetadataManager omMetadataManager) {
+    // actually fileTable with FSO
+    Table keyTable = omMetadataManager.getKeyTable();
+
+    TableIterator<String, ? extends Table.KeyValue<String, OmKeyInfo>>
+            keyTableIter = keyTable.iterator();
+
+    try {
+      // reinit Recon RocksDB's namespace CF.
+      reconNamespaceSummaryManager.initNSSummaryTable();
+
+      while (keyTableIter.hasNext()) {
+        Table.KeyValue<String, OmKeyInfo> kv = keyTableIter.next();
+        OmKeyInfo keyInfo = kv.getValue();
+        writeOmKeyInfoOnNamespaceDB(keyInfo);
+      }
+
+      Table dirTable = omMetadataManager.getDirectoryTable();
+      TableIterator<String, ? extends Table.KeyValue<String, OmDirectoryInfo>>
+              dirTableIter = dirTable.iterator();
+
+      while (dirTableIter.hasNext()) {
+        Table.KeyValue<String, OmDirectoryInfo> kv = dirTableIter.next();
+        OmDirectoryInfo directoryInfo = kv.getValue();
+        writeOmDirectoryInfoOnNamespaceDB(directoryInfo);
+      }
+
+    } catch (IOException ioEx) {
+      LOG.error("Unable to reprocess Namespace Summary data in Recon DB. ",
+              ioEx);
+      return new ImmutablePair<>(getTaskName(), false);
+    }
+
+    LOG.info("Completed a reprocess run of NSSummaryTask");
+    return new ImmutablePair<>(getTaskName(), true);
+  }
+
+  private void writeOmKeyInfoOnNamespaceDB(OmKeyInfo keyInfo)
+          throws IOException {
+    long parentObjectId = keyInfo.getParentObjectID();
+    NSSummary nsSummary = reconNamespaceSummaryManager
+            .getNSSummary(parentObjectId);
+    if (nsSummary == null) {
+      nsSummary = new NSSummary();
+    }
+    int numOfFile = nsSummary.getNumOfFiles();
+    long sizeOfFile = nsSummary.getSizeOfFiles();
+    int[] fileBucket = nsSummary.getFileSizeBucket();
+    nsSummary.setNumOfFiles(numOfFile + 1);
+    long dataSize = keyInfo.getDataSize();
+    nsSummary.setSizeOfFiles(sizeOfFile + dataSize);
+    int binIndex = ReconUtils.getBinIndex(dataSize);
+
+    // make sure the file is within our scope of tracking.
+    if (binIndex >= 0 && binIndex < ReconConstants.NUM_OF_BINS) {

Review comment:
       The FileSizeCountTask was supposed to map all files >1PB to the last bucket. Hence, there is no overflow of bin index conceptually. Is that not the case? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org