You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "virajjasani (via GitHub)" <gi...@apache.org> on 2023/06/17 06:59:47 UTC

[GitHub] [hadoop] virajjasani opened a new pull request, #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

virajjasani opened a new pull request, #5754:
URL: https://github.com/apache/hadoop/pull/5754

   Jira: HADOOP-18291


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247471235


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   True, I was thinking, would it be possible via a simple UT as well, where we pass in the entries as we desire and access them in our preferences to test functionality, might be easier if we directly test the LRU logic than via the stream.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1635203629

   `us-west-2`:
   
   ```
   $ mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale -Dprefetch
   
   $ mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale
   ```
   
   test results look good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240904151


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   i applied this patch temporarily to debug further but somehow head and tail are getting screwed up:
   
   ```
   diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java
   index ef685b54d30..1aad82ff9c2 100644
   --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java
   +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java
   @@ -306,6 +306,7 @@ private void addToHeadOfLinkedList(Entry entry) {
              "Block num {} to be added to the head. Current head block num: {} and tail block num: {}",
              entry.blockNumber, head.blockNumber, tail.blockNumber);
          if (entry != head) {
   +        boolean isEntryTail = entry == tail;
            Entry prev = entry.getPrevious();
            Entry nxt = entry.getNext();
            if (prev != null) {
   @@ -318,10 +319,8 @@ private void addToHeadOfLinkedList(Entry entry) {
            entry.setNext(head);
            head.setPrevious(entry);
            head = entry;
   -      }
   -      if (tail != null) {
   -        while (tail.getNext() != null) {
   -          tail = tail.getNext();
   +        if (isEntryTail) {
   +          tail = prev;
            }
          }
        } finally {
   
   ```
   
   however, somehow after eviction, the head and tail are getting screwed up, still trying to understand what is going wrong and why this patch would not work.
   but i hope you were suggestion change like this one, correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240905788


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   it works with slight modification, let me push the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247450167


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();
+    try {
+      if (blocks.size() > maxBlocksCount && !closed.get()) {
+        Entry elementToPurge = tail;
+        tail = tail.getPrevious();
+        if (tail == null) {
+          tail = head;
+        }
+        tail.setNext(null);
+        elementToPurge.setPrevious(null);
+        deleteBlockFileAndEvictCache(elementToPurge);
+      }
+    } finally {
+      blocksLock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Delete cache file as part of the block cache LRU eviction.
+   *
+   * @param elementToPurge Block entry to evict.
+   */
+  private void deleteBlockFileAndEvictCache(Entry elementToPurge) {
+    boolean lockAcquired =
+        elementToPurge.takeLock(Entry.LockType.WRITE, PREFETCH_WRITE_LOCK_TIMEOUT,
+            PREFETCH_WRITE_LOCK_TIMEOUT_UNIT);
+    if (!lockAcquired) {
+      LOG.error("Cache file {} deletion would not be attempted as write lock could not"

Review Comment:
   if eviction misses it, stream close would be able to clean it up.
   if stream close misses it, then it stays on disk and we might eventually also come up with some "file last accessed" based check and maybe some crons removing them eventually. not a bad idea IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1622353657

   @mehakmeet could you please take another look at the latest patch? i have done multiple rounds of testing and run the whole suite with -scale too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1605731619

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  17m 43s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 31s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 14s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m 16s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 52s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 37s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 17s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 15s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  3s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 16s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 12s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 52s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 16s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 213m 51s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/3/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 31ca6914db70 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 108cb9640641b749701572fcaa5f65ba72e3b9de |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/3/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240621453


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {

Review Comment:
   done, added on L305, and yes i agree that it can be helpful, in fact i added it for testing purpose anyways



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   sure, let's say:
   
   head -> 1, tail -> 2
   
   new block: 3
   so, we need to make: 3 -> 1 -> 2
   i.e. new head -> 3, tail -> 2
   
   new block: 4
   updated list: 4 -> 3 -> 1 -> 2
   
   now let's say input stream accesses block 2, hence block 2 needs to be put at the head.
   new list should be: 2 -> 4 -> 3 -> 1
   
   we change head to 2 and we also update next pointer of block 1
   however, if we don't update tail (L322-L326), we will not be able to move tail to block 1.
   



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();

Review Comment:
   yeah that would still be okay given that eviction would take place anyways



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();
+    try {
+      if (blocks.size() > maxBlocksCount && !closed.get()) {
+        Entry elementToPurge = tail;
+        tail = tail.getPrevious();
+        if (tail == null) {
+          tail = head;
+        }
+        tail.setNext(null);
+        elementToPurge.setPrevious(null);
+        deleteBlockFileAndEvictCache(elementToPurge);
+      }
+    } finally {
+      blocksLock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Delete cache file as part of the block cache LRU eviction.
+   *
+   * @param elementToPurge Block entry to evict.
+   */
+  private void deleteBlockFileAndEvictCache(Entry elementToPurge) {
+    boolean lockAcquired =
+        elementToPurge.takeLock(Entry.LockType.WRITE, PREFETCH_WRITE_LOCK_TIMEOUT,
+            PREFETCH_WRITE_LOCK_TIMEOUT_UNIT);
+    if (!lockAcquired) {
+      LOG.error("Cache file {} deletion would not be attempted as write lock could not"

Review Comment:
   since we are already using 5s at other place also (PREFETCH_WRITE_LOCK_TIMEOUT), used it here as well but happy to change it in future as/if we encounter some problem with this, does that sound good?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1262051712


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1276,4 +1276,16 @@ private Constants() {
   public static final String STORE_CAPABILITY_DIRECTORY_MARKER_MULTIPART_UPLOAD_ENABLED =
       "fs.s3a.capability.multipart.uploads.enabled";
 
+  /**
+   * Prefetch max blocks count config.
+   * Value = {@value PREFETCH_MAX_BLOCKS_COUNT}
+   */
+  public static final String PREFETCH_MAX_BLOCKS_COUNT = "fs.s3a.prefetch.max.blocks.count";
+
+  /**
+   * Default value for max blocks count config.
+   * Value = {@value DEFAULT_PREFETCH_MAX_BLOCKS_COUNT}
+   */
+  public static final int DEFAULT_PREFETCH_MAX_BLOCKS_COUNT = 10;

Review Comment:
   perhaps 4 is enough?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247458556


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -77,13 +78,14 @@ public ITestS3APrefetchingInputStream() {
 
   private static final int TIMEOUT_MILLIS = 5000;
   private static final int INTERVAL_MILLIS = 500;
-
+  private static final int PREFETCH_MAX_NUM_BLOCKS = 3;
 
   @Override
   public Configuration createConfiguration() {
     Configuration conf = super.createConfiguration();
     S3ATestUtils.removeBaseAndBucketOverrides(conf, PREFETCH_ENABLED_KEY);
     conf.setBoolean(PREFETCH_ENABLED_KEY, true);
+    conf.setInt(FS_PREFETCH_MAX_BLOCKS_COUNT, PREFETCH_MAX_NUM_BLOCKS);

Review Comment:
   nit: Remove base and bucket config for this property in L86, just so that test is consistent in diff env.



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+

Review Comment:
   add a comment here explaining what the test is doing on a high level, so that it's easier to understand the flow of how LRU is happening.



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+

Review Comment:
   Add a seek to the intersection point of two blocks(eg: 4*blockSize - 10KB) and read some bytes(>10KBs) to read both blocks and assert if the head of the list is the correct block.



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 4 and don't read completely
+      in.seek(blockSize * 4L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 5 and don't read completely
+      in.seek(blockSize * 5L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // backward seek, can't use block 0 as it is evicted
+      in.seek(S_1K * 5);
+      in.read();
+

Review Comment:
   Assert the number of evictions being done or blocks being removed from the list. At certain points, test what the capacity of the list is to keep it consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240673021


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   Okay, but when we are adding the node to the head, doesn't it make more sense to check if the current node is tail, get the previous of this, and set that to tail? This would work, was just interested if we can avoid traversing the list 🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240622243


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();

Review Comment:
   yeah that should still be okay IMO given that eviction would take place anyways (before or after another element is added to head (by cache read or by writing new block entry)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1249984399


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.test.LambdaTestUtils;
+
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_MAX_BLOCKS_COUNT;
+import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.verifyStatisticGaugeValue;
+import static org.apache.hadoop.fs.statistics.StreamStatisticNames.STREAM_READ_BLOCKS_IN_FILE_CACHE;
+import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
+
+/**
+ * Test the prefetching input stream with LRU cache eviction on S3ACachingInputStream.
+ */
+@RunWith(Parameterized.class)
+public class ITestS3APrefetchingLruEviction extends AbstractS3ACostTest {
+
+  private final String maxBlocks;
+
+  @Parameterized.Parameters(name = "max-blocks-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{
+        {"1"},
+        {"2"},
+        {"3"}
+    });
+  }

Review Comment:
   now we test LRU block cache max size: 1, 2, 3 and ensure that results stay consistent with the size i.e. evictions do take place to maintain the max size of the cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1636598987

   created addendum PR for dealing with small file #5843


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1615308837

   @mukund-thakur i tried using guava LoadingCache, it's not consistently able to evict cache entries, it's doing asynchronously with weak ref and hence leading to inconsistent num of entries.
   
   for instance, even when i set max size as 1, i can see 8 entries in the map for more than 15s. hence, maintaining consistency with concurrency seems really problematic with this implementation.
   there is option to set concurrency too, but still somehow eviction is not frequent enough, i suspect this might be because of this:
   ```
   An update to the map and recording
   of reads may not be immediately reflected on the algorithm's data structures.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1239686973


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -166,16 +201,39 @@ private boolean takeLock(LockType lockType, long timeout, TimeUnit unit) {
       }
       return false;
     }
+
+    private Entry getPrevious() {
+      return previous;
+    }
+
+    private void setPrevious(Entry previous) {
+      this.previous = previous;
+    }
+
+    private Entry getNext() {
+      return next;
+    }
+
+    private void setNext(Entry next) {
+      this.next = next;
+    }
   }
 
   /**
    * Constructs an instance of a {@code SingleFilePerBlockCache}.
    *
    * @param prefetchingStatistics statistics for this stream.
+   * @param conf the configuration object.
    */
-  public SingleFilePerBlockCache(PrefetchingStatistics prefetchingStatistics) {
+  public SingleFilePerBlockCache(PrefetchingStatistics prefetchingStatistics, Configuration conf) {
     this.prefetchingStatistics = requireNonNull(prefetchingStatistics);
     this.closed = new AtomicBoolean(false);
+    this.maxBlocksCount =
+        conf.getInt(FS_PREFETCH_MAX_BLOCKS_COUNT, DEFAULT_FS_PREFETCH_MAX_BLOCKS_COUNT);
+    Preconditions.checkArgument(this.maxBlocksCount > 0,
+        "prefetch blocks total capacity should be more than 0");

Review Comment:
   Include the property name in the error message by which we can set this to a valid value 



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   Can you explain a bit about this part, not able to get why this is needed?



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -89,6 +110,16 @@ public class SingleFilePerBlockCache implements BlockCache {
   private static final Set<PosixFilePermission> TEMP_FILE_ATTRS =
       ImmutableSet.of(PosixFilePermission.OWNER_READ, PosixFilePermission.OWNER_WRITE);
 
+  /**
+   * Prefetch max blocks count config.
+   */
+  public static final String FS_PREFETCH_MAX_BLOCKS_COUNT = "fs.prefetch.max.blocks.count";

Review Comment:
   Is there a Constants class where we can move this to?



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();
+    try {
+      if (blocks.size() > maxBlocksCount && !closed.get()) {
+        Entry elementToPurge = tail;
+        tail = tail.getPrevious();
+        if (tail == null) {
+          tail = head;
+        }
+        tail.setNext(null);
+        elementToPurge.setPrevious(null);
+        deleteBlockFileAndEvictCache(elementToPurge);
+      }
+    } finally {
+      blocksLock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Delete cache file as part of the block cache LRU eviction.
+   *
+   * @param elementToPurge Block entry to evict.
+   */
+  private void deleteBlockFileAndEvictCache(Entry elementToPurge) {
+    boolean lockAcquired =
+        elementToPurge.takeLock(Entry.LockType.WRITE, PREFETCH_WRITE_LOCK_TIMEOUT,
+            PREFETCH_WRITE_LOCK_TIMEOUT_UNIT);
+    if (!lockAcquired) {
+      LOG.error("Cache file {} deletion would not be attempted as write lock could not"

Review Comment:
   So, there can be a scenario where the current cache exceeds its normal capacity? Is 5 seconds enough time? or are we okay with this?



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -61,7 +62,27 @@ public class SingleFilePerBlockCache implements BlockCache {
   /**
    * Blocks stored in this cache.
    */
-  private final Map<Integer, Entry> blocks = new ConcurrentHashMap<>();
+  private final Map<Integer, Entry> blocks;
+
+  /**
+   * Total max blocks count, to be considered as baseline for LRU cache.
+   */
+  private final int maxBlocksCount;
+
+  /**
+   * The lock to be shared by LRU based linked list updates.
+   */
+  private final ReentrantReadWriteLock blocksLock;
+
+  /**
+   * Head of the linked list.
+   */
+  private Entry head;
+
+  /**
+   * Tail of the lined list.

Review Comment:
   typo: "linked"



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {

Review Comment:
   Can we add bit of logging here for debug purposes? Like entry's block number, maybe head an tail at this point would be good too see as well.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();

Review Comment:
   There can be a scenario when we just added the entry to the head and before we try to evict the tail, another entry acquires the write lock inside `addToHeadOfLinkedList(entry)` and adds the entry at the head. Not sure if that would cause any issues tho...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1262044107


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.test.LambdaTestUtils;
+
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_MAX_BLOCKS_COUNT;
+import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.verifyStatisticGaugeValue;
+import static org.apache.hadoop.fs.statistics.StreamStatisticNames.STREAM_READ_BLOCKS_IN_FILE_CACHE;
+import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
+
+/**
+ * Test the prefetching input stream with LRU cache eviction on S3ACachingInputStream.
+ */
+@RunWith(Parameterized.class)
+public class ITestS3APrefetchingLruEviction extends AbstractS3ACostTest {
+
+  private final String maxBlocks;
+
+  @Parameterized.Parameters(name = "max-blocks-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{
+        {"1"},
+        {"2"},
+        {"3"}
+    });
+  }
+
+  public ITestS3APrefetchingLruEviction(final String maxBlocks) {
+    super(true);
+    this.maxBlocks = maxBlocks;
+  }
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(ITestS3APrefetchingLruEviction.class);
+
+  private static final int S_1K = 1024;
+  // Path for file which should have length > block size so S3ACachingInputStream is used
+  private Path largeFile;
+  private FileSystem largeFileFS;
+  private int blockSize;
+
+  private static final int TIMEOUT_MILLIS = 5000;
+  private static final int INTERVAL_MILLIS = 500;
+
+  @Override
+  public Configuration createConfiguration() {
+    Configuration conf = super.createConfiguration();
+    conf.setBoolean(PREFETCH_ENABLED_KEY, true);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247490918


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 4 and don't read completely
+      in.seek(blockSize * 4L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 5 and don't read completely
+      in.seek(blockSize * 5L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // backward seek, can't use block 0 as it is evicted
+      in.seek(S_1K * 5);
+      in.read();
+

Review Comment:
   sounds great, let me create follow-up sub-task for introducing the metric, and update the test with the sub-task.
   this will likely keep the commit history clean and easy to manage :)
   
   thanks you once again!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247931503


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/Constants.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.impl.prefetch;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Constants used by prefetch implementations.
+ */
+public final class Constants {
+
+  private Constants() {
+  }
+
+  /**
+   * Prefetch max blocks count config.

Review Comment:
   add {@value} here and below so IDE popups show the value



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -103,13 +114,17 @@ private enum LockType {
       READ,
       WRITE
     }
+    private Entry previous;

Review Comment:
   is it really necessary to implement a doubly linked list? isn't there one already in the system?



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/Constants.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.impl.prefetch;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Constants used by prefetch implementations.
+ */
+public final class Constants {

Review Comment:
   can we have a different name please; too confusing



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/Constants.java:
##########
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hadoop.fs.impl.prefetch;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Constants used by prefetch implementations.
+ */
+public final class Constants {
+
+  private Constants() {
+  }
+
+  /**
+   * Prefetch max blocks count config.
+   */
+  public static final String FS_PREFETCH_MAX_BLOCKS_COUNT = "fs.prefetch.max.blocks.count";

Review Comment:
   I think we should use fs.s3a here for per bucket settings.
   
   how about just pass in the count as a parameter, rather than a full Configuration object?



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +307,116 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    ExecutorService executorService = Executors.newFixedThreadPool(5,
+        new ThreadFactoryBuilder()
+            .setDaemon(true)
+            .setNameFormat("testSeeksWithLruEviction-%d")
+            .build());
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+      
+      // tests to add multiple blocks in the prefetch cache
+      // and let LRU eviction take place as more cache entries
+      // are added with multiple block reads.
+
+      executorService.submit(() -> {
+        byte[] buffer = new byte[blockSize];
+        // Don't read block 0 completely
+        try {
+          in.read(buffer, 0, blockSize - S_1K * 10);
+        } catch (IOException e) {
+          throw new RuntimeException(e);

Review Comment:
   it should be an UncheckedIOException, but why not just `return true` from each closure so they become Callable<Boolean> rather than Runnable, so can throw exceptions
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran merged pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran merged PR #5754:
URL: https://github.com/apache/hadoop/pull/5754


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1595705310

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  18m  1s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 37s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m 16s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 25s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 12s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 41s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 29s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 55s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 15s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 52s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 17s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 218m 36s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/1/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e294eb7de530 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 282ddd19539c6d4e9f24a9b0991cc9dcb0d0d93e |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/1/testReport/ |
   | Max. process+thread count | 2580 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1633984369

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 51s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 30s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m 12s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 59s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 28s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  37m 58s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 34s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 16s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  4s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 54s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 48s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 18s |  |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   8m  6s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/10/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 254m 54s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/10/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 786bfee867de 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 709163c197e99891eb98210f024f626022f3cacc |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/10/testReport/ |
   | Max. process+thread count | 1302 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/10/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1616819067

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 4 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  18m 21s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 21s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m  8s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m  4s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 51s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  35m 21s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 39s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m  8s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 17s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 14s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 19s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 51s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 15s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 249m 32s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/7/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 5ae178d25df9 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2006290b2dd3ea1b69fb6f8b2be04a8618e61cea |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/7/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/7/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247447055


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();
+    try {
+      if (blocks.size() > maxBlocksCount && !closed.get()) {
+        Entry elementToPurge = tail;
+        tail = tail.getPrevious();
+        if (tail == null) {
+          tail = head;
+        }
+        tail.setNext(null);
+        elementToPurge.setPrevious(null);
+        deleteBlockFileAndEvictCache(elementToPurge);
+      }
+    } finally {
+      blocksLock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Delete cache file as part of the block cache LRU eviction.
+   *
+   * @param elementToPurge Block entry to evict.
+   */
+  private void deleteBlockFileAndEvictCache(Entry elementToPurge) {
+    boolean lockAcquired =
+        elementToPurge.takeLock(Entry.LockType.WRITE, PREFETCH_WRITE_LOCK_TIMEOUT,
+            PREFETCH_WRITE_LOCK_TIMEOUT_UNIT);
+    if (!lockAcquired) {
+      LOG.error("Cache file {} deletion would not be attempted as write lock could not"

Review Comment:
   It seems like we are okay with things not blowing up if eviction is not successful, are we okay with it? Can this hurt in the long run?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1614556557

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m 15s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 20s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 51s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  17m  6s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   5m  2s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 59s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 58s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 25s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m  8s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  18m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 44s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 44s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/5/artifact/out/blanks-eol.txt) |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   5m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 53s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 31s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 39s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 53s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 15s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 253m 17s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/5/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 12427714771b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 87ea6042a44a89114f70120a84c53e6e3c034984 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/5/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1615259963

   > Can we not use some already inbuilt cache rather than us implementing it from scratch ( this part is interesting for sure :)) https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/LocalCache.java guava cache supports maximumSize for eviction and it internally uses LRU. ( see the Java docs of the class. )
   
   Interesting,
   
   from the javadoc:
   ```
      * The page replacement algorithm's data structures are kept casually consistent with the map. The
      * ordering of writes to a segment is sequentially consistent. An update to the map and recording
      * of reads may not be immediately reflected on the algorithm's data structures.
   ```
   
   for not-so-heavy loaded cache, perhaps our own implementation might be better? given that even reads would have immediate reflection on the doubly linked list data structure in our case.
   let me explore a bit more though. thanks for the reference!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247474198


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 4 and don't read completely
+      in.seek(blockSize * 4L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 5 and don't read completely
+      in.seek(blockSize * 5L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // backward seek, can't use block 0 as it is evicted
+      in.seek(S_1K * 5);
+      in.read();
+

Review Comment:
   Yes, that should be fine as well, I saw this method, so thought we were already recording that metric `blockRemovedFromFileCache()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1615543634

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 4 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m 34s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m 31s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 19s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m  8s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 11s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m  5s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 26s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  34m 55s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 11s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 14s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 17s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  6s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 23s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 45s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 11s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 52s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 13s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 246m 48s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/6/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux a7e8d6fb0e5a 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c72a55616dfe9925258cc5057a18529905457482 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/6/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/6/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1615268043

   > tests: I want to see this working with a block size of 1; head==tail and adding a new block replaces the entire list.
   
   this will need new test class because if we change block size config to 1 for ITestS3APrefetchingInputStream, rest of the tests fail since we do lot of IOStats assertions with num of blocks prefetched etc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1631058598

   > readFile() needs to always close the file channel, even of a read fails.
   
   sounds good for both read and write, will file a jira unless already filed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247449167


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();
+    try {
+      if (blocks.size() > maxBlocksCount && !closed.get()) {
+        Entry elementToPurge = tail;
+        tail = tail.getPrevious();
+        if (tail == null) {
+          tail = head;
+        }
+        tail.setNext(null);
+        elementToPurge.setPrevious(null);
+        deleteBlockFileAndEvictCache(elementToPurge);
+      }
+    } finally {
+      blocksLock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Delete cache file as part of the block cache LRU eviction.
+   *
+   * @param elementToPurge Block entry to evict.
+   */
+  private void deleteBlockFileAndEvictCache(Entry elementToPurge) {
+    boolean lockAcquired =
+        elementToPurge.takeLock(Entry.LockType.WRITE, PREFETCH_WRITE_LOCK_TIMEOUT,
+            PREFETCH_WRITE_LOCK_TIMEOUT_UNIT);
+    if (!lockAcquired) {
+      LOG.error("Cache file {} deletion would not be attempted as write lock could not"

Review Comment:
   it should be okay, in fact we have same logic for input stream close as well, if eviction or removal of disk block is unsuccessful, we are leaving them with a fat warning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1605366358

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  17m 53s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 15s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m 10s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 15s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 54s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 51s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m  9s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m  9s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m  7s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 21s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | -1 :x: |  spotbugs  |   2m 52s | [/new-spotbugs-hadoop-common-project_hadoop-common.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/2/artifact/out/new-spotbugs-hadoop-common-project_hadoop-common.html) |  hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  22m 33s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 11s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 53s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 16s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 213m 51s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | SpotBugs | module:hadoop-common-project/hadoop-common |
   |  |  Nullcheck of SingleFilePerBlockCache.tail at line 322 of value previously dereferenced in org.apache.hadoop.fs.impl.prefetch.SingleFilePerBlockCache.addToHeadOfLinkedList(SingleFilePerBlockCache$Entry)  At SingleFilePerBlockCache.java:322 of value previously dereferenced in org.apache.hadoop.fs.impl.prefetch.SingleFilePerBlockCache.addToHeadOfLinkedList(SingleFilePerBlockCache$Entry)  At SingleFilePerBlockCache.java:[line 307] |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/2/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e5f152550d4e 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 96b483c4a35bf9416d8a4845882ad9d1cb7ef27f |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/2/testReport/ |
   | Max. process+thread count | 3011 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1595661932

   `us-west-2`:
   
   ```
   $ mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale -Dprefetch
   
   $ mvn clean verify -Dparallel-tests -DtestsThreadCount=8 -Dscale
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1603372337

   @mukund-thakur @mehakmeet could you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247476231


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   nice idea, it might be more beneficial for UT to test this.
   
   i am also planning to refactor Entry class on it's own new class rather than as an inner class of SingleFilePerBlockCache as part of next follow-up sub-task. once we do that, then it might be even more easier to write some UT to directly access head, tail pointers.
   
   sorry, i was thinking this as sub-task so maybe adding UT can also be done with sub-task. does that sound good?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247448062


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   thanks, i was actually thinking about it but there is just not clean way of doing so, hence what i have been able to do so far was by "logging" head and tail nodes (as you also mentioned earlier) with all other nodes, so that i could track the exact nodes sequence. that's the best way i could find so far, but extracting that info in IT is really difficult (if we were to do it in clean way).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1614267547

   > Looks good. Some changes in the test. What would really be good is to Test this in a multi-threaded test too, nothing major but creating an executor Service with 5 threads then trying to read and evict 3 blocks and asserting the correct IOStats and read values should be enough. This would test the locking as well.
   
   thanks for the review @mehakmeet
   updated test to perform seeks and reads in parallel using thread pool.
   
   so we have two follow-up sub-tasks to be created later:
   1. refactor Entry class to its own independent class and create UT to test LRU cache head and tail pointers
   2. add `blockEvictedFromFileCache()` metric
   
   does this sound good?
   
   i will re-run the whole test-suite against `us-west-2`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1635020426

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 4 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m 28s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m  7s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  17m 24s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  16m 19s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 49s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 11s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m  4s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 22s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  34m 52s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 18s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 20s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 16s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 50s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 57s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 13s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 11s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 56s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 11s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 247m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/11/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 749d07bca684 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 796a3d4dd3113fcf2f2c6b4ef3433d9bf711aa35 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/11/testReport/ |
   | Max. process+thread count | 2330 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/11/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1633997580

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 55s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  38m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 45s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  17m 49s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 50s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   3m 49s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  38m 49s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  39m 15s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 46s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  17m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m  1s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  17m  1s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 40s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m  7s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 28s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m  0s |  |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   7m  4s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/9/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 268m 46s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/9/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux abe9c31f0de3 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 709163c197e99891eb98210f024f626022f3cacc |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/9/testReport/ |
   | Max. process+thread count | 2497 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/9/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247470283


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();
+    try {
+      if (blocks.size() > maxBlocksCount && !closed.get()) {
+        Entry elementToPurge = tail;
+        tail = tail.getPrevious();
+        if (tail == null) {
+          tail = head;
+        }
+        tail.setNext(null);
+        elementToPurge.setPrevious(null);
+        deleteBlockFileAndEvictCache(elementToPurge);
+      }
+    } finally {
+      blocksLock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Delete cache file as part of the block cache LRU eviction.
+   *
+   * @param elementToPurge Block entry to evict.
+   */
+  private void deleteBlockFileAndEvictCache(Entry elementToPurge) {
+    boolean lockAcquired =
+        elementToPurge.takeLock(Entry.LockType.WRITE, PREFETCH_WRITE_LOCK_TIMEOUT,
+            PREFETCH_WRITE_LOCK_TIMEOUT_UNIT);
+    if (!lockAcquired) {
+      LOG.error("Cache file {} deletion would not be attempted as write lock could not"

Review Comment:
   Okay, sounds good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1636070773

   merged, though now i'm using it that new test is way too slow. in my rebased unbuffered pr I have moved it to -Dscale, but really we can just set the block size down to something minimal and then work with a small file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1240890150


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   yes, nice optimization for sure, let me double check this again, i ran through multiple test iterations and somehow found that this works for sure but let me check if the optimization works (i think it should but i am just wondering if i am missing some cases).
   
   Thank you btw @mehakmeet !!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1634009019

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m 12s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  38m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  19m 19s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  18m 55s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   5m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m  0s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 52s |  |  branch has no errors when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  41m 20s |  |  Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  16m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 21s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  16m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks issues.  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 34s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 12s |  |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  |   8m 50s | [/patch-unit-hadoop-tools_hadoop-aws.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/8/artifact/out/patch-unit-hadoop-tools_hadoop-aws.txt) |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 273m 23s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/8/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e55254152ab5 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 709163c197e99891eb98210f024f626022f3cacc |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/8/testReport/ |
   | Max. process+thread count | 1263 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/8/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1634715531

   re-running the whole test suite with -prefetch and -scale combinations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mukund-thakur commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mukund-thakur (via GitHub)" <gi...@apache.org>.
mukund-thakur commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1615218649

   Can we not use some already inbuilt cache rather than us implementing it from scratch ( this part is interesting for sure :)) 
   https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/LocalCache.java 
   guava cache supports maximumSize for eviction and it internally uses LRU. ( see the Java docs of the class. ) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1248329607


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -103,13 +114,17 @@ private enum LockType {
       READ,
       WRITE
     }
+    private Entry previous;

Review Comment:
   LinkedHashMap has one implementation that can be overridden but it's not much helpful for thread-safe implementation as the only alternative they provide is for us to synchronize all read/write operations in LinkedHashMap



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247486332


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 4 and don't read completely
+      in.seek(blockSize * 4L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 5 and don't read completely
+      in.seek(blockSize * 5L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // backward seek, can't use block 0 as it is evicted
+      in.seek(S_1K * 5);
+      in.read();
+

Review Comment:
   Yea, why not, it'll be good for debugging purposes if there's any difference between them we would know that there's some issue with the proper deletion of the files from cache. Although an overkill but never hurts 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] mehakmeet commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "mehakmeet (via GitHub)" <gi...@apache.org>.
mehakmeet commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247446174


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -247,9 +305,46 @@ private Entry getEntry(int blockNumber) {
       throw new IllegalStateException(String.format("block %d not found in cache", blockNumber));
     }
     numGets++;
+    addToHeadOfLinkedList(entry);
     return entry;
   }
 
+  /**
+   * Add the given entry to the head of the linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToHeadOfLinkedList(Entry entry) {
+    blocksLock.writeLock().lock();
+    try {
+      if (head == null) {
+        head = entry;
+        tail = entry;
+      }
+      if (entry != head) {
+        Entry prev = entry.getPrevious();
+        Entry nxt = entry.getNext();
+        if (prev != null) {
+          prev.setNext(nxt);
+        }
+        if (nxt != null) {
+          nxt.setPrevious(prev);
+        }
+        entry.setPrevious(null);
+        entry.setNext(head);
+        head.setPrevious(entry);
+        head = entry;
+      }
+      if (tail != null) {
+        while (tail.getNext() != null) {
+          tail = tail.getNext();
+        }
+      }

Review Comment:
   Yes, happy with the new change I think. Would be good to explicitly test certain tail changing scenarios in the IT like you mentioned above if we are not already doing it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247477275


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 4 and don't read completely
+      in.seek(blockSize * 4L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 5 and don't read completely
+      in.seek(blockSize * 5L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // backward seek, can't use block 0 as it is evicted
+      in.seek(S_1K * 5);
+      in.read();
+

Review Comment:
   got it, shall we introduce new metric that would help differentiate `blockRemovedFromFileCache()` vs `blockEvictedFromFileCache()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1247472501


##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -301,4 +303,56 @@ public void testStatusProbesAfterClosingStream() throws Throwable {
 
   }
 
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+
+      byte[] buffer = new byte[blockSize];
+
+      // Don't read block 0 completely
+      in.read(buffer, 0, blockSize - S_1K * 10);
+
+      // Seek to block 1 and don't read completely
+      in.seek(blockSize);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 2 and don't read completely
+      in.seek(blockSize * 2L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 3 and don't read completely
+      in.seek(blockSize * 3L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 4 and don't read completely
+      in.seek(blockSize * 4L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // Seek to block 5 and don't read completely
+      in.seek(blockSize * 5L);
+      in.read(buffer, 0, 2 * S_1K);
+
+      // backward seek, can't use block 0 as it is evicted
+      in.seek(S_1K * 5);
+      in.read();
+

Review Comment:
   for the num of assertions, i was thinking of adding metric as next sub-task so that this patch doesn't become too complicated to review. is that fine with you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] hadoop-yetus commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "hadoop-yetus (via GitHub)" <gi...@apache.org>.
hadoop-yetus commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1614551275

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 35s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to include 3 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  17m 50s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  33m  4s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 40s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  compile  |  17m 28s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   4m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  trunk passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 35s |  |  branch has no errors when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 49s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javac  |  17m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 25s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |  17m 25s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/4/artifact/out/blanks-eol.txt) |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   4m 37s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 54s |  |  the patch passed with JDK Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  the patch passed with JDK Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 41s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  36m  2s |  |  patch has no errors when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 29s |  |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  |   2m 53s |  |  hadoop-aws in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  7s |  |  The patch does not generate ASF License warnings.  |
   |  |   | 257m 18s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/4/artifact/out/Dockerfile |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5754 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 3e9ea299b868 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ab428291150c72671162cff6c4cf7bf89f0e76d2 |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.19+7-post-Ubuntu-0ubuntu120.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/4/testReport/ |
   | Max. process+thread count | 2148 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: . |
   | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5754/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1248909177


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -103,13 +114,17 @@ private enum LockType {
       READ,
       WRITE
     }
+    private Entry previous;

Review Comment:
   we could have also used LinkedList that Java provides, which internally does use doubly linked list, however the catch is, the only way we can remove an element from the middle of the list is by using `remove(Object o)` method and that does traverse the list from the head, and hence has time complexity of O(n), whereas by maintaining pointers ourselves, we can remove any element from anywhere in the list with O(1) time complexity.
   
   LinkedList https://docs.oracle.com/javase/8/docs/api/java/util/LinkedList.html
   ```
       /**
        * Removes the first occurrence of the specified element from this list,
        * if it is present.  If this list does not contain the element, it is
        * unchanged.  More formally, removes the element with the lowest index
        * {@code i} such that
        * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>
        * (if such an element exists).  Returns {@code true} if this list
        * contained the specified element (or equivalently, if this list
        * changed as a result of the call).
        *
        * @param o element to be removed from this list, if present
        * @return {@code true} if this list contained the specified element
        */
       public boolean remove(Object o) {
           if (o == null) {
               for (Node<E> x = first; x != null; x = x.next) {
                   if (x.item == null) {
                       unlink(x);
                       return true;
                   }
               }
           } else {
               for (Node<E> x = first; x != null; x = x.next) {
                   if (o.equals(x.item)) {
                       unlink(x);
                       return true;
                   }
               }
           }
           return false;
       }
   ```
   
   we could use `remove(int index)` but then we will end up maintaining indexes which is even more complicated because as any element gets added or removed to the list, we will have to update our entire map with the indexes of all affected elements and hence it ends up being O(n) operation anyways.
   
   hence, i decided to maintain doubly linked list ourselves, the algo is quite less error-prone (based on good amount of concurrency testing that i did).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1249624106


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/SingleFilePerBlockCache.java:
##########
@@ -299,9 +395,62 @@ public void put(int blockNumber, ByteBuffer buffer, Configuration conf,
     // Update stream_read_blocks_in_cache stats only after blocks map is updated with new file
     // entry to avoid any discrepancy related to the value of stream_read_blocks_in_cache.
     // If stream_read_blocks_in_cache is updated before updating the blocks map here, closing of
-    // the input stream can lead to the removal of the cache file even before blocks is added with
-    // the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
+    // the input stream can lead to the removal of the cache file even before blocks is added
+    // with the new cache file, leading to incorrect value of stream_read_blocks_in_cache.
     prefetchingStatistics.blockAddedToFileCache();
+    addToLinkedListAndEvictIfRequired(entry);
+  }
+
+  /**
+   * Add the given entry to the head of the linked list and if the LRU cache size
+   * exceeds the max limit, evict tail of the LRU linked list.
+   *
+   * @param entry Block entry to add.
+   */
+  private void addToLinkedListAndEvictIfRequired(Entry entry) {
+    addToHeadOfLinkedList(entry);
+    blocksLock.writeLock().lock();

Review Comment:
   this is now taken care of with latest revision



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] steveloughran commented on a diff in pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "steveloughran (via GitHub)" <gi...@apache.org>.
steveloughran commented on code in PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#discussion_r1261160595


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java:
##########
@@ -1276,4 +1276,16 @@ private Constants() {
   public static final String STORE_CAPABILITY_DIRECTORY_MARKER_MULTIPART_UPLOAD_ENABLED =
       "fs.s3a.capability.multipart.uploads.enabled";
 
+  /**
+   * Prefetch max blocks count config.
+   * Value = {@value PREFETCH_MAX_BLOCKS_COUNT}
+   */
+  public static final String PREFETCH_MAX_BLOCKS_COUNT = "fs.s3a.prefetch.max.blocks.count";
+
+  /**
+   * Default value for max blocks count config.
+   * Value = {@value DEFAULT_PREFETCH_MAX_BLOCKS_COUNT}
+   */
+  public static final int DEFAULT_PREFETCH_MAX_BLOCKS_COUNT = 10;

Review Comment:
   this is per stream? it's big. we may want to see what happens in apps with many threads -risk of out of disk is high.



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingInputStream.java:
##########
@@ -77,13 +78,13 @@ public ITestS3APrefetchingInputStream() {
 
   private static final int TIMEOUT_MILLIS = 5000;
   private static final int INTERVAL_MILLIS = 500;
-
+  private static final int PREFETCH_MAX_BLOCKS = 3;
 
   @Override
   public Configuration createConfiguration() {
     Configuration conf = super.createConfiguration();
-    S3ATestUtils.removeBaseAndBucketOverrides(conf, PREFETCH_ENABLED_KEY);

Review Comment:
   this needs to be restored, and add PREFETCH_MAX_BLOCKS_COUNT, as another to cut.
   
   in all Itests, assume that any option set in `createConfiguration()' MAY have a per-bucket override set by someone, so you MUST explicitly remove it.



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.test.LambdaTestUtils;
+
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_MAX_BLOCKS_COUNT;
+import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.verifyStatisticGaugeValue;
+import static org.apache.hadoop.fs.statistics.StreamStatisticNames.STREAM_READ_BLOCKS_IN_FILE_CACHE;
+import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
+
+/**
+ * Test the prefetching input stream with LRU cache eviction on S3ACachingInputStream.
+ */
+@RunWith(Parameterized.class)
+public class ITestS3APrefetchingLruEviction extends AbstractS3ACostTest {
+
+  private final String maxBlocks;
+
+  @Parameterized.Parameters(name = "max-blocks-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{
+        {"1"},
+        {"2"},
+        {"3"}
+    });
+  }
+
+  public ITestS3APrefetchingLruEviction(final String maxBlocks) {
+    super(true);
+    this.maxBlocks = maxBlocks;
+  }
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(ITestS3APrefetchingLruEviction.class);
+
+  private static final int S_1K = 1024;
+  // Path for file which should have length > block size so S3ACachingInputStream is used
+  private Path largeFile;
+  private FileSystem largeFileFS;
+  private int blockSize;
+
+  private static final int TIMEOUT_MILLIS = 5000;
+  private static final int INTERVAL_MILLIS = 500;
+
+  @Override
+  public Configuration createConfiguration() {
+    Configuration conf = super.createConfiguration();
+    conf.setBoolean(PREFETCH_ENABLED_KEY, true);

Review Comment:
   again, remove base+bucket overrides of the options



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.test.LambdaTestUtils;
+
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_MAX_BLOCKS_COUNT;
+import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.verifyStatisticGaugeValue;
+import static org.apache.hadoop.fs.statistics.StreamStatisticNames.STREAM_READ_BLOCKS_IN_FILE_CACHE;
+import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
+
+/**
+ * Test the prefetching input stream with LRU cache eviction on S3ACachingInputStream.
+ */
+@RunWith(Parameterized.class)
+public class ITestS3APrefetchingLruEviction extends AbstractS3ACostTest {
+
+  private final String maxBlocks;
+
+  @Parameterized.Parameters(name = "max-blocks-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{
+        {"1"},
+        {"2"},
+        {"3"}
+    });
+  }
+
+  public ITestS3APrefetchingLruEviction(final String maxBlocks) {
+    super(true);
+    this.maxBlocks = maxBlocks;
+  }
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(ITestS3APrefetchingLruEviction.class);
+
+  private static final int S_1K = 1024;
+  // Path for file which should have length > block size so S3ACachingInputStream is used
+  private Path largeFile;
+  private FileSystem largeFileFS;
+  private int blockSize;
+
+  private static final int TIMEOUT_MILLIS = 5000;
+  private static final int INTERVAL_MILLIS = 500;
+
+  @Override
+  public Configuration createConfiguration() {
+    Configuration conf = super.createConfiguration();
+    conf.setBoolean(PREFETCH_ENABLED_KEY, true);
+    conf.setInt(PREFETCH_MAX_BLOCKS_COUNT, Integer.parseInt(maxBlocks));
+    return conf;
+  }
+
+  @Override
+  public void teardown() throws Exception {
+    super.teardown();
+    cleanupWithLogger(LOG, largeFileFS);
+    largeFileFS = null;
+  }
+
+  private void openFS() throws Exception {
+    Configuration conf = getConfiguration();
+    String largeFileUri = S3ATestUtils.getCSVTestFile(conf);
+
+    largeFile = new Path(largeFileUri);
+    blockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE);
+    largeFileFS = new S3AFileSystem();
+    largeFileFS.initialize(new URI(largeFileUri), getConfiguration());
+  }
+
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    ExecutorService executorService = Executors.newFixedThreadPool(5,
+        new ThreadFactoryBuilder()
+            .setDaemon(true)
+            .setNameFormat("testSeeksWithLruEviction-%d")
+            .build());
+    CountDownLatch countDownLatch = new CountDownLatch(7);
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+      // tests to add multiple blocks in the prefetch cache
+      // and let LRU eviction take place as more cache entries
+      // are added with multiple block reads.
+
+      executorService.submit(() -> {

Review Comment:
   these submitted closures are almost identical. why not define one for readFully(), one for PositionedRedable.readFully() and submit them; maybe as some function
   
   ```
   Callable<Boolean> readFully(CountDownLatch countDownLatch, boolean positionedReadable, long offset, long size) {
     // return one of these with seek(offset) and read to a buffer of [size]
   }
   then submit these
   ```
   



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.test.LambdaTestUtils;
+
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_MAX_BLOCKS_COUNT;
+import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.verifyStatisticGaugeValue;
+import static org.apache.hadoop.fs.statistics.StreamStatisticNames.STREAM_READ_BLOCKS_IN_FILE_CACHE;
+import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
+
+/**
+ * Test the prefetching input stream with LRU cache eviction on S3ACachingInputStream.
+ */
+@RunWith(Parameterized.class)
+public class ITestS3APrefetchingLruEviction extends AbstractS3ACostTest {
+
+  private final String maxBlocks;
+
+  @Parameterized.Parameters(name = "max-blocks-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{
+        {"1"},
+        {"2"},
+        {"3"}
+    });
+  }
+
+  public ITestS3APrefetchingLruEviction(final String maxBlocks) {
+    super(true);
+    this.maxBlocks = maxBlocks;
+  }
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(ITestS3APrefetchingLruEviction.class);
+
+  private static final int S_1K = 1024;
+  // Path for file which should have length > block size so S3ACachingInputStream is used
+  private Path largeFile;
+  private FileSystem largeFileFS;
+  private int blockSize;
+
+  private static final int TIMEOUT_MILLIS = 5000;
+  private static final int INTERVAL_MILLIS = 500;
+
+  @Override
+  public Configuration createConfiguration() {
+    Configuration conf = super.createConfiguration();
+    conf.setBoolean(PREFETCH_ENABLED_KEY, true);
+    conf.setInt(PREFETCH_MAX_BLOCKS_COUNT, Integer.parseInt(maxBlocks));
+    return conf;
+  }
+
+  @Override
+  public void teardown() throws Exception {
+    super.teardown();
+    cleanupWithLogger(LOG, largeFileFS);
+    largeFileFS = null;
+  }
+
+  private void openFS() throws Exception {
+    Configuration conf = getConfiguration();
+    String largeFileUri = S3ATestUtils.getCSVTestFile(conf);
+
+    largeFile = new Path(largeFileUri);
+    blockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE);
+    largeFileFS = new S3AFileSystem();
+    largeFileFS.initialize(new URI(largeFileUri), getConfiguration());
+  }
+
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    ExecutorService executorService = Executors.newFixedThreadPool(5,
+        new ThreadFactoryBuilder()
+            .setDaemon(true)
+            .setNameFormat("testSeeksWithLruEviction-%d")
+            .build());
+    CountDownLatch countDownLatch = new CountDownLatch(7);
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+      // tests to add multiple blocks in the prefetch cache
+      // and let LRU eviction take place as more cache entries
+      // are added with multiple block reads.
+
+      executorService.submit(() -> {
+        byte[] buffer = new byte[blockSize];
+        // Don't read block 0 completely
+        try {
+          in.read(buffer, 0, blockSize - S_1K * 10);
+          countDownLatch.countDown();
+          return true;
+        } catch (IOException e) {
+          throw new UncheckedIOException(e);
+        }
+      });
+
+      executorService.submit(() -> {
+        byte[] buffer = new byte[blockSize];
+        // Seek to block 1 and don't read completely
+        try {
+          in.seek(blockSize);

Review Comment:
   use PositionedReadable to include it in the test coverage
   ```
   readFully(blockSize * 4L, buffer, 0, 2 * S_1K);
   ```



##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3APrefetchingLruEviction.java:
##########
@@ -0,0 +1,241 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import java.io.IOException;
+import java.io.UncheckedIOException;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.hadoop.thirdparty.com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
+import org.apache.hadoop.fs.statistics.IOStatistics;
+import org.apache.hadoop.test.LambdaTestUtils;
+
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
+import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_MAX_BLOCKS_COUNT;
+import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.verifyStatisticGaugeValue;
+import static org.apache.hadoop.fs.statistics.StreamStatisticNames.STREAM_READ_BLOCKS_IN_FILE_CACHE;
+import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
+
+/**
+ * Test the prefetching input stream with LRU cache eviction on S3ACachingInputStream.
+ */
+@RunWith(Parameterized.class)
+public class ITestS3APrefetchingLruEviction extends AbstractS3ACostTest {
+
+  private final String maxBlocks;
+
+  @Parameterized.Parameters(name = "max-blocks-{0}")
+  public static Collection<Object[]> params() {
+    return Arrays.asList(new Object[][]{
+        {"1"},
+        {"2"},
+        {"3"}
+    });
+  }
+
+  public ITestS3APrefetchingLruEviction(final String maxBlocks) {
+    super(true);
+    this.maxBlocks = maxBlocks;
+  }
+
+  private static final Logger LOG =
+      LoggerFactory.getLogger(ITestS3APrefetchingLruEviction.class);
+
+  private static final int S_1K = 1024;
+  // Path for file which should have length > block size so S3ACachingInputStream is used
+  private Path largeFile;
+  private FileSystem largeFileFS;
+  private int blockSize;
+
+  private static final int TIMEOUT_MILLIS = 5000;
+  private static final int INTERVAL_MILLIS = 500;
+
+  @Override
+  public Configuration createConfiguration() {
+    Configuration conf = super.createConfiguration();
+    conf.setBoolean(PREFETCH_ENABLED_KEY, true);
+    conf.setInt(PREFETCH_MAX_BLOCKS_COUNT, Integer.parseInt(maxBlocks));
+    return conf;
+  }
+
+  @Override
+  public void teardown() throws Exception {
+    super.teardown();
+    cleanupWithLogger(LOG, largeFileFS);
+    largeFileFS = null;
+  }
+
+  private void openFS() throws Exception {
+    Configuration conf = getConfiguration();
+    String largeFileUri = S3ATestUtils.getCSVTestFile(conf);
+
+    largeFile = new Path(largeFileUri);
+    blockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE);
+    largeFileFS = new S3AFileSystem();
+    largeFileFS.initialize(new URI(largeFileUri), getConfiguration());
+  }
+
+  @Test
+  public void testSeeksWithLruEviction() throws Throwable {
+    IOStatistics ioStats;
+    openFS();
+
+    ExecutorService executorService = Executors.newFixedThreadPool(5,
+        new ThreadFactoryBuilder()
+            .setDaemon(true)
+            .setNameFormat("testSeeksWithLruEviction-%d")
+            .build());
+    CountDownLatch countDownLatch = new CountDownLatch(7);
+
+    try (FSDataInputStream in = largeFileFS.open(largeFile)) {
+      ioStats = in.getIOStatistics();
+      // tests to add multiple blocks in the prefetch cache
+      // and let LRU eviction take place as more cache entries
+      // are added with multiple block reads.
+
+      executorService.submit(() -> {
+        byte[] buffer = new byte[blockSize];
+        // Don't read block 0 completely
+        try {
+          in.read(buffer, 0, blockSize - S_1K * 10);

Review Comment:
   use readFully(); always.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] virajjasani commented on pull request #5754: HADOOP-18291. S3A prefetch - Implement thread-safe LRU cache for SingleFilePerBlockCache

Posted by "virajjasani (via GitHub)" <gi...@apache.org>.
virajjasani commented on PR #5754:
URL: https://github.com/apache/hadoop/pull/5754#issuecomment-1636141106

   i see, i was thinking, perhaps small file with prefetch of multiple blocks might be too small for the test?
   
   but on the other hand, i agree that we don't need such long running test either, let me at least remove "3" and "4" from block size array and have the test time duration reduce by half for now in an addendum?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org