You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/27 18:55:54 UTC

[GitHub] [hudi] prashantwason commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

prashantwason commented on a change in pull request #2494:
URL: https://github.com/apache/hudi/pull/2494#discussion_r565555570



##########
File path: hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java
##########
@@ -209,12 +212,23 @@ public R next() {
 
   @Override
   public Option getRecordByKey(String key, Schema readerSchema) throws IOException {
-    HFileScanner scanner = reader.getScanner(false, true);
+    byte[] value = null;
     KeyValue kv = new KeyValue(key.getBytes(), null, null, null);
-    if (scanner.seekTo(kv) == 0) {
-      Cell c = scanner.getKeyValue();
-      byte[] keyBytes = Arrays.copyOfRange(c.getRowArray(), c.getRowOffset(), c.getRowOffset() + c.getRowLength());
-      R record = getRecordFromCell(c, getSchema(), readerSchema);
+
+    synchronized (this) {

Review comment:
       The concurrent calls happen when TimelineService is being used. TimelineService is based on Javalin/Jetty which has a thread-per-request synchronous model. Multiple threads in parallel will call in the app's router to handle the HTTP request. 
   
   Within the org.apache.hudi.timeline.service.FileSystemViewHandler, we handle the remote calls by code similar to -- viewManager.getFileSystemView(basePath).getLatestXXX(...)
   
   When Metadata Table is enabled, the file system view should be HoodieMetadataFileSystemView which uses a HoodieTableMetadata (class variable). Hence, the HoodieTableMetadata.getAllFilesInPartition() will be called concurrently on multiple threads. This function internally calls the getRecordByKey().




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org