You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/03/25 18:11:00 UTC

[GitHub] [accumulo] milleruntime opened a new pull request #2592: Configure block cache for Recovery

milleruntime opened a new pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592


   * Closes #882


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#discussion_r835616582



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -79,7 +86,8 @@ public RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, L
 
       for (Path log : logFiles) {
         var scanner = RFile.newScanner().from(log.toString()).withFileSystem(fs)
-            .withTableProperties(context.getConfiguration()).build();
+            .withTableProperties(context.getConfiguration()).withDataCache(dataCacheSize)

Review comment:
       This cache is per recovery log iterator.  This does not help with the case where tablet server has 100 tablets that need to recover data from the same walogs.  To help with this case would need to create dedicated block caches for recovery at the tsever level and pass them into this iterator.  Probably would want their own config so they can be size differently than the block caches for scans.  Also probably would not want to use the scan block cache for this as it will always be disjoint files and we probably do not want recovery to impact the caches used for scans by causing evictions.

##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -61,6 +65,9 @@
    */
   public RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, LogFileKey start,
       LogFileKey end, boolean checkFirstKey) throws IOException {
+    var conf = context.getConfiguration();
+    var indexCacheSize = getMemoryAsBytes(conf.get(Property.TSERV_INDEXCACHE_SIZE));

Review comment:
       May want dedicated props for the recovery cache size.  For example I may want a 10G cache for scans and 256M cache for recovery.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#issuecomment-1080833006


   @keith-turner I came up with some ideas for how this can be done in [a7c51c6](https://github.com/apache/accumulo/pull/2592/commits/a7c51c613631b3e9bf779680b45414c042278c00). It is not finished so let me know what you think before I go any further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#issuecomment-1083011404


   @keith-turner I think I may have found what I need with `IteratorAdapter`. I am going to rework this change so please stand by.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#discussion_r836380755



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -79,7 +86,8 @@ public RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, L
 
       for (Path log : logFiles) {
         var scanner = RFile.newScanner().from(log.toString()).withFileSystem(fs)
-            .withTableProperties(context.getConfiguration()).build();
+            .withTableProperties(context.getConfiguration()).withDataCache(dataCacheSize)

Review comment:
       OK that is what I thought. I thought this was at least better than what we have now.
   
   > would not want to use the scan block cache for this
   
   What do you mean?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#discussion_r836977794



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -79,7 +86,8 @@ public RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, L
 
       for (Path log : logFiles) {
         var scanner = RFile.newScanner().from(log.toString()).withFileSystem(fs)
-            .withTableProperties(context.getConfiguration()).build();
+            .withTableProperties(context.getConfiguration()).withDataCache(dataCacheSize)

Review comment:
       We currently create 4 caches here : https://github.com/apache/accumulo/blob/d093c5f850954b192d5bafa3bca3a931535937a8/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServerResourceManager.java#L272-L278
   
   I was thinking we would add a 4th type of cache there for recovery.  Following the pattern of the other it would be shared among all recoveries in the tserver and have its own configurable size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#discussion_r837594040



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -79,7 +86,8 @@ public RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, L
 
       for (Path log : logFiles) {
         var scanner = RFile.newScanner().from(log.toString()).withFileSystem(fs)
-            .withTableProperties(context.getConfiguration()).build();
+            .withTableProperties(context.getConfiguration()).withDataCache(dataCacheSize)

Review comment:
       So I created a separate BlockCache for Recovery in the TabletServer but once I pass it down to the Recovery code, I am having trouble creating an Iterator from the `FileSKVIterator` type. The `FileSKVIterator` type allows passing in a `CacheProvider` but then why I try to convert the iterator to a generic java iterator I get stuck. For the new Recovery code we are using Scanners to iterate over the R-files and keep track of this by creating a `Iterator<Entry<Key,Value>> iter` from calling the `iterator()` method.
   https://github.com/apache/accumulo/blob/1c0aa8f54642d42c5c4fbb6970141b8acd7a63d9/core/src/main/java/org/apache/accumulo/core/client/ScannerBase.java#L198
   
   That method does the work of creating caches but I don't have access to them in `RFileScanner`.
   https://github.com/apache/accumulo/blob/7d176b60fc6a2ce28fcc3cfdc9a3b68684c76f87/core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScanner.java#L340
   
   Any ideas?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] EdColeman commented on pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
EdColeman commented on pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#issuecomment-1083036895


   Do you think setting this to draft while you look at IteratorAdapter would be appropriate?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2592: Configure block cache for Recovery

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2592:
URL: https://github.com/apache/accumulo/pull/2592#discussion_r837386274



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/log/RecoveryLogsIterator.java
##########
@@ -79,7 +86,8 @@ public RecoveryLogsIterator(ServerContext context, List<Path> recoveryLogDirs, L
 
       for (Path log : logFiles) {
         var scanner = RFile.newScanner().from(log.toString()).withFileSystem(fs)
-            .withTableProperties(context.getConfiguration()).build();
+            .withTableProperties(context.getConfiguration()).withDataCache(dataCacheSize)

Review comment:
       So do you think we would need a separate Data cache and index cache dedicated to Recovery? Then have the TabletServerResourceManager return a dedicated ScanCacheProvider for recovery?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org