You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/11/15 16:26:43 UTC

[GitHub] [accumulo] dlmarion opened a new pull request, #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

dlmarion opened a new pull request, #3079:
URL: https://github.com/apache/accumulo/pull/3079

   Closes #3078 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
keith-turner commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023248061


##########
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java:
##########
@@ -83,8 +84,27 @@ public CachableBuilder conf(Configuration hadoopConf) {
     }
 
     public CachableBuilder fsPath(FileSystem fs, Path dataFile) {
+      return fsPath(fs, dataFile, false);
+    }
+
+    public CachableBuilder fsPath(FileSystem fs, Path dataFile, boolean dropCacheBehind) {
       this.cacheId = pathToCacheId(dataFile);
-      this.inputSupplier = () -> fs.open(dataFile);
+      this.inputSupplier = () -> {
+        FSDataInputStream is = fs.open(dataFile);
+        if (dropCacheBehind) {
+          // Tell the DataNode that the write ahead log does not need to be cached in the OS page
+          // cache
+          try {
+            is.setDropBehind(Boolean.TRUE);

Review Comment:
   I was wondering how I could know if this changes was working as intended, thought adding some trace logging that could be flipped on might be useful.
   
   ```suggestion
               is.setDropBehind(Boolean.TRUE);
               log.trace("Called setDropBehind(TRUE) for stream reading file {}", dataFile);
   ```



##########
server/base/src/main/java/org/apache/accumulo/server/compaction/FileCompactor.java:
##########
@@ -313,7 +313,8 @@ public CompactionStats call() throws IOException, CompactionCanceledException {
 
         reader = fileFactory.newReaderBuilder()
             .forFile(mapFile.getPathStr(), fs, fs.getConf(), cryptoService)
-            .withTableConfiguration(acuTableConf).withRateLimiter(env.getReadLimiter()).build();
+            .withTableConfiguration(acuTableConf).withRateLimiter(env.getReadLimiter())
+            .dropCachesBehind().build();

Review Comment:
   @dlmarion  this is setting the compaction input files to drop behind.  Did you consider setting it for the compaction output file as well?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
dlmarion commented on PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#issuecomment-1315560777

   Be sure to look at commit msg for explanation of changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023144825


##########
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java:
##########
@@ -83,8 +84,24 @@ public CachableBuilder conf(Configuration hadoopConf) {
     }
 
     public CachableBuilder fsPath(FileSystem fs, Path dataFile) {
+      return fsPath(fs, dataFile, false);
+    }
+
+    public CachableBuilder fsPath(FileSystem fs, Path dataFile, boolean dropCacheBehind) {
       this.cacheId = pathToCacheId(dataFile);
-      this.inputSupplier = () -> fs.open(dataFile);
+      this.inputSupplier = () -> {
+        FSDataInputStream is = fs.open(dataFile);
+        if (dropCacheBehind) {
+          // Tell the DataNode that the write ahead log does not need to be cached in the OS page
+          // cache
+          try {
+            is.setDropBehind(Boolean.TRUE);
+          } catch (IOException | UnsupportedOperationException e) {
+            log.debug("setDropBehind not enabled for wal file: {}", dataFile);

Review Comment:
   If you decide to include the extra log information, it'd be good to update the other two that were added in #3077 as well, to be consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023132794


##########
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java:
##########
@@ -83,8 +84,24 @@ public CachableBuilder conf(Configuration hadoopConf) {
     }
 
     public CachableBuilder fsPath(FileSystem fs, Path dataFile) {
+      return fsPath(fs, dataFile, false);
+    }
+
+    public CachableBuilder fsPath(FileSystem fs, Path dataFile, boolean dropCacheBehind) {
       this.cacheId = pathToCacheId(dataFile);
-      this.inputSupplier = () -> fs.open(dataFile);
+      this.inputSupplier = () -> {
+        FSDataInputStream is = fs.open(dataFile);
+        if (dropCacheBehind) {
+          // Tell the DataNode that the write ahead log does not need to be cached in the OS page
+          // cache
+          try {
+            is.setDropBehind(Boolean.TRUE);
+          } catch (IOException | UnsupportedOperationException e) {
+            log.debug("setDropBehind not enabled for wal file: {}", dataFile);

Review Comment:
   I'm wondering if it's useful to show the message from the exception in this debug message, at least in the case of IOException.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
keith-turner commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023239585


##########
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java:
##########
@@ -83,8 +84,24 @@ public CachableBuilder conf(Configuration hadoopConf) {
     }
 
     public CachableBuilder fsPath(FileSystem fs, Path dataFile) {
+      return fsPath(fs, dataFile, false);
+    }
+
+    public CachableBuilder fsPath(FileSystem fs, Path dataFile, boolean dropCacheBehind) {
       this.cacheId = pathToCacheId(dataFile);
-      this.inputSupplier = () -> fs.open(dataFile);
+      this.inputSupplier = () -> {
+        FSDataInputStream is = fs.open(dataFile);
+        if (dropCacheBehind) {
+          // Tell the DataNode that the write ahead log does not need to be cached in the OS page
+          // cache
+          try {
+            is.setDropBehind(Boolean.TRUE);
+          } catch (IOException | UnsupportedOperationException e) {
+            log.debug("setDropBehind not enabled for wal file: {}", dataFile);

Review Comment:
   > I can do that, do you think in the case of IOException, that it should be higher than debug? warn maybe?
   
   Hopefully if the stream is in a bad state and it throws an IOE here then it will throw an IOE later when trying to use it.  Those later throws should log something, so maybe the debug here is ok?  Would be nice to understand when it throws an IOE.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
dlmarion commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023265950


##########
server/base/src/main/java/org/apache/accumulo/server/compaction/FileCompactor.java:
##########
@@ -313,7 +313,8 @@ public CompactionStats call() throws IOException, CompactionCanceledException {
 
         reader = fileFactory.newReaderBuilder()
             .forFile(mapFile.getPathStr(), fs, fs.getConf(), cryptoService)
-            .withTableConfiguration(acuTableConf).withRateLimiter(env.getReadLimiter()).build();
+            .withTableConfiguration(acuTableConf).withRateLimiter(env.getReadLimiter())
+            .dropCachesBehind().build();

Review Comment:
   > this is setting the compaction input files to drop behind
   
   Yes
   
   > Did you consider setting it for the compaction output file as well?
   
   I put my thoughts [here](https://github.com/apache/accumulo/issues/3078#issuecomment-1315293006). Maybe you know the answer...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023143586


##########
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java:
##########
@@ -83,8 +84,24 @@ public CachableBuilder conf(Configuration hadoopConf) {
     }
 
     public CachableBuilder fsPath(FileSystem fs, Path dataFile) {
+      return fsPath(fs, dataFile, false);
+    }
+
+    public CachableBuilder fsPath(FileSystem fs, Path dataFile, boolean dropCacheBehind) {
       this.cacheId = pathToCacheId(dataFile);
-      this.inputSupplier = () -> fs.open(dataFile);
+      this.inputSupplier = () -> {
+        FSDataInputStream is = fs.open(dataFile);
+        if (dropCacheBehind) {
+          // Tell the DataNode that the write ahead log does not need to be cached in the OS page
+          // cache
+          try {
+            is.setDropBehind(Boolean.TRUE);
+          } catch (IOException | UnsupportedOperationException e) {
+            log.debug("setDropBehind not enabled for wal file: {}", dataFile);

Review Comment:
   I'm not really sure what circumstances could trigger this. It's probably not worth logging higher than debug, since this is really an internal performance optimization, and not a scenario the user really needs to concern themselves with unless they're trying to eek out every nanosecond of performance, in which case they can just look at the debug logs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
dlmarion commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023140976


##########
core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java:
##########
@@ -83,8 +84,24 @@ public CachableBuilder conf(Configuration hadoopConf) {
     }
 
     public CachableBuilder fsPath(FileSystem fs, Path dataFile) {
+      return fsPath(fs, dataFile, false);
+    }
+
+    public CachableBuilder fsPath(FileSystem fs, Path dataFile, boolean dropCacheBehind) {
       this.cacheId = pathToCacheId(dataFile);
-      this.inputSupplier = () -> fs.open(dataFile);
+      this.inputSupplier = () -> {
+        FSDataInputStream is = fs.open(dataFile);
+        if (dropCacheBehind) {
+          // Tell the DataNode that the write ahead log does not need to be cached in the OS page
+          // cache
+          try {
+            is.setDropBehind(Boolean.TRUE);
+          } catch (IOException | UnsupportedOperationException e) {
+            log.debug("setDropBehind not enabled for wal file: {}", dataFile);

Review Comment:
   I can do that, do you think in the case of IOException, that it should be higher than debug? warn maybe?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
keith-turner commented on code in PR #3079:
URL: https://github.com/apache/accumulo/pull/3079#discussion_r1023271507


##########
server/base/src/main/java/org/apache/accumulo/server/compaction/FileCompactor.java:
##########
@@ -313,7 +313,8 @@ public CompactionStats call() throws IOException, CompactionCanceledException {
 
         reader = fileFactory.newReaderBuilder()
             .forFile(mapFile.getPathStr(), fs, fs.getConf(), cryptoService)
-            .withTableConfiguration(acuTableConf).withRateLimiter(env.getReadLimiter()).build();
+            .withTableConfiguration(acuTableConf).withRateLimiter(env.getReadLimiter())
+            .dropCachesBehind().build();

Review Comment:
   I just saw that comment,  I didn't see it before asking here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion merged pull request #3079: Wire up FSDataInputStream.setDropBehind, set to true in FileCompactor

Posted by GitBox <gi...@apache.org>.
dlmarion merged PR #3079:
URL: https://github.com/apache/accumulo/pull/3079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org