You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/29 04:00:37 UTC

[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

vinothchandar commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r566564182



##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##########
@@ -188,41 +196,51 @@ private synchronized void openFileSliceIfNeeded() throws IOException {
 
     // Load the schema
     Schema schema = HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema());
-    logRecordScanner = new HoodieMetadataMergedLogRecordScanner(metaClient.getFs(), metadataBasePath,
-            logFilePaths, schema, latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
+    HoodieMetadataMergedLogRecordScanner logRecordScanner = new HoodieMetadataMergedLogRecordScanner(metaClient.getFs(),
+            metadataBasePath, logFilePaths, schema, latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
             spillableMapDirectory, null);
 
     LOG.info("Opened metadata log files from " + logFilePaths + " at instant " + latestInstantTime
         + "(dataset instant=" + latestInstantTime + ", metadata instant=" + latestMetaInstantTimestamp + ")");
 
     metrics.ifPresent(metrics -> metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR, timer.endTimer()));
+
+    if (metadataConfig.enableReuse()) {
+      // cache for later reuse
+      cachedBaseFileReader = baseFileReader;
+      cachedLogRecordScanner = logRecordScanner;
+    }
+
+    return Pair.of(baseFileReader, logRecordScanner);
   }
 
-  private void closeIfNeeded() {
+  private void closeIfNeeded(Pair<HoodieFileReader, HoodieMetadataMergedLogRecordScanner> readers) {

Review comment:
       I prefer the previous approach for readability.  i.e have it just close the member variables. If you disagree, may be we can chat about why you think this is better for reading. I did face this issue when I originally read the code here

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##########
@@ -188,41 +196,51 @@ private synchronized void openFileSliceIfNeeded() throws IOException {
 
     // Load the schema
     Schema schema = HoodieAvroUtils.addMetadataFields(HoodieMetadataRecord.getClassSchema());
-    logRecordScanner = new HoodieMetadataMergedLogRecordScanner(metaClient.getFs(), metadataBasePath,
-            logFilePaths, schema, latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
+    HoodieMetadataMergedLogRecordScanner logRecordScanner = new HoodieMetadataMergedLogRecordScanner(metaClient.getFs(),
+            metadataBasePath, logFilePaths, schema, latestMetaInstantTimestamp, MAX_MEMORY_SIZE_IN_BYTES, BUFFER_SIZE,
             spillableMapDirectory, null);
 
     LOG.info("Opened metadata log files from " + logFilePaths + " at instant " + latestInstantTime
         + "(dataset instant=" + latestInstantTime + ", metadata instant=" + latestMetaInstantTimestamp + ")");
 
     metrics.ifPresent(metrics -> metrics.updateMetrics(HoodieMetadataMetrics.SCAN_STR, timer.endTimer()));
+
+    if (metadataConfig.enableReuse()) {

Review comment:
       We are sort of creating two code paths again for reuse and non-reuse. Can we please go back to just always initializing the member variable here always? 
   then `closeIfNeeded()` can continue to work with members alone. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org