You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2021/05/13 00:37:38 UTC

[GitHub] [samza] li-afaris opened a new pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

li-afaris opened a new pull request #1500:
URL: https://github.com/apache/samza/pull/1500


   This PR contains changes to expose maxOpenFiles & maxFileOpeningThreads as Samza job parameters.
   
   Changes build & pass tests
   
   ```
   $ ./gradlew clean build
   ...
   > Task :samza-yarn_2.11:compileTestScala
   Pruning sources from previous analysis, due to incompatible CompileSetup.
   Note: Some input files use or override a deprecated API.
   Note: Recompile with -Xlint:deprecation for details.
   Note: Some input files use unchecked or unsafe operations.
   Note: Recompile with -Xlint:unchecked for details.
   
   BUILD SUCCESSFUL in 28m 9s
   209 actionable tasks: 208 executed, 1 up-to-date
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] li-afaris commented on a change in pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
li-afaris commented on a change in pull request #1500:
URL: https://github.com/apache/samza/pull/1500#discussion_r631495076



##########
File path: samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
##########
@@ -109,6 +111,8 @@ public static Options options(Config storeConfig, int numTasksForContainer, File
     options.setMaxLogFileSize(storeConfig.getLong(ROCKSDB_MAX_LOG_FILE_SIZE_BYTES, 64 * 1024 * 1024L));
     options.setKeepLogFileNum(storeConfig.getLong(ROCKSDB_KEEP_LOG_FILE_NUM, 2));
     options.setDeleteObsoleteFilesPeriodMicros(storeConfig.getLong(ROCKSDB_DELETE_OBSOLETE_FILES_PERIOD_MICROS, 21600000000L));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_OPEN_FILES, -1));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_FILE_OPENING_THREADS, 16));

Review comment:
       This is not a typo. We want to limit the number of sst files which are opened to prevent exceeding ulimit values.
   
   Here's the java doc description for the method 
   ```
   Number of open files that can be used by the DB. You may need to increase this if your database has a large working set. Value -1 means files opened are always kept open. You can estimate number of files based on target_file_size_base and target_file_size_multiplier for level-based compaction. For universal-style compaction, you can usually set it to -1.
   ```
   https://javadoc.io/doc/org.rocksdb/rocksdbjni/6.6.4/org/rocksdb/Options.html#maxOpenFiles--
   
   Which matches this RocksDB option
   ```
     // Number of open files that can be used by the DB.  You may need to
     // increase this if your database has a large working set. Value -1 means
     // files opened are always kept open. You can estimate number of files based
     // on target_file_size_base and target_file_size_multiplier for level-based
     // compaction. For universal-style compaction, you can usually set it to -1.
   ```
   https://github.com/facebook/rocksdb/blob/master/include/rocksdb/options.h#L460




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] prateekm commented on a change in pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
prateekm commented on a change in pull request #1500:
URL: https://github.com/apache/samza/pull/1500#discussion_r631490013



##########
File path: docs/learn/documentation/versioned/jobs/configuration-table.html
##########
@@ -1911,6 +1911,22 @@ <h1>Samza Configuration Reference</h1>
                     </td>
                 </tr>
 
+                <tr>
+                    <td class="property" id="stores-rocksdb-max-open-files">stores.<span class="store">store-name</span>.<br>rocksdb.max.open.files</td>
+                    <td class="default">-1</td>
+                    <td class="description">
+                        Limits the number of open LOG files that RocksDB can have open at one time.

Review comment:
       Same for config below.

##########
File path: docs/learn/documentation/versioned/jobs/configuration-table.html
##########
@@ -1911,6 +1911,22 @@ <h1>Samza Configuration Reference</h1>
                     </td>
                 </tr>
 
+                <tr>
+                    <td class="property" id="stores-rocksdb-max-open-files">stores.<span class="store">store-name</span>.<br>rocksdb.max.open.files</td>
+                    <td class="default">-1</td>
+                    <td class="description">
+                        Limits the number of open LOG files that RocksDB can have open at one time.

Review comment:
       s/LOG files/files?

##########
File path: samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
##########
@@ -109,6 +111,8 @@ public static Options options(Config storeConfig, int numTasksForContainer, File
     options.setMaxLogFileSize(storeConfig.getLong(ROCKSDB_MAX_LOG_FILE_SIZE_BYTES, 64 * 1024 * 1024L));
     options.setKeepLogFileNum(storeConfig.getLong(ROCKSDB_KEEP_LOG_FILE_NUM, 2));
     options.setDeleteObsoleteFilesPeriodMicros(storeConfig.getLong(ROCKSDB_DELETE_OBSOLETE_FILES_PERIOD_MICROS, 21600000000L));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_OPEN_FILES, -1));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_FILE_OPENING_THREADS, 16));

Review comment:
       This is setting maxOpenFiles, typo?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] li-afaris commented on a change in pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
li-afaris commented on a change in pull request #1500:
URL: https://github.com/apache/samza/pull/1500#discussion_r631499397



##########
File path: samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
##########
@@ -109,6 +111,8 @@ public static Options options(Config storeConfig, int numTasksForContainer, File
     options.setMaxLogFileSize(storeConfig.getLong(ROCKSDB_MAX_LOG_FILE_SIZE_BYTES, 64 * 1024 * 1024L));
     options.setKeepLogFileNum(storeConfig.getLong(ROCKSDB_KEEP_LOG_FILE_NUM, 2));
     options.setDeleteObsoleteFilesPeriodMicros(storeConfig.getLong(ROCKSDB_DELETE_OBSOLETE_FILES_PERIOD_MICROS, 21600000000L));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_OPEN_FILES, -1));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_FILE_OPENING_THREADS, 16));

Review comment:
       Ooof, great catch on that copy/paste error.   Thanks & updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] prateekm merged pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
prateekm merged pull request #1500:
URL: https://github.com/apache/samza/pull/1500


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] prateekm commented on a change in pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
prateekm commented on a change in pull request #1500:
URL: https://github.com/apache/samza/pull/1500#discussion_r631496361



##########
File path: samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
##########
@@ -109,6 +111,8 @@ public static Options options(Config storeConfig, int numTasksForContainer, File
     options.setMaxLogFileSize(storeConfig.getLong(ROCKSDB_MAX_LOG_FILE_SIZE_BYTES, 64 * 1024 * 1024L));
     options.setKeepLogFileNum(storeConfig.getLong(ROCKSDB_KEEP_LOG_FILE_NUM, 2));
     options.setDeleteObsoleteFilesPeriodMicros(storeConfig.getLong(ROCKSDB_DELETE_OBSOLETE_FILES_PERIOD_MICROS, 21600000000L));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_OPEN_FILES, -1));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_FILE_OPENING_THREADS, 16));

Review comment:
       Sorry, I meant that this is calling options.setMaxOpenFiles for ROCKSDB_MAX_FILE_OPENING_THREADS as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] prateekm commented on a change in pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
prateekm commented on a change in pull request #1500:
URL: https://github.com/apache/samza/pull/1500#discussion_r631496361



##########
File path: samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
##########
@@ -109,6 +111,8 @@ public static Options options(Config storeConfig, int numTasksForContainer, File
     options.setMaxLogFileSize(storeConfig.getLong(ROCKSDB_MAX_LOG_FILE_SIZE_BYTES, 64 * 1024 * 1024L));
     options.setKeepLogFileNum(storeConfig.getLong(ROCKSDB_KEEP_LOG_FILE_NUM, 2));
     options.setDeleteObsoleteFilesPeriodMicros(storeConfig.getLong(ROCKSDB_DELETE_OBSOLETE_FILES_PERIOD_MICROS, 21600000000L));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_OPEN_FILES, -1));
+    options.setMaxOpenFiles(storeConfig.getInt(ROCKSDB_MAX_FILE_OPENING_THREADS, 16));

Review comment:
       Sorry, I meant that this is calling options.setMaxOpenFiles for ROCKSDB_MAX_FILE_OPENING_THREADS as well. Should be calling setMaxFileOpeningThreads instead.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] li-afaris commented on a change in pull request #1500: [SAMZA-2655] Expose RocksDB maxOpenFiles setting as Samza parameter

Posted by GitBox <gi...@apache.org>.
li-afaris commented on a change in pull request #1500:
URL: https://github.com/apache/samza/pull/1500#discussion_r631496704



##########
File path: docs/learn/documentation/versioned/jobs/configuration-table.html
##########
@@ -1911,6 +1911,22 @@ <h1>Samza Configuration Reference</h1>
                     </td>
                 </tr>
 
+                <tr>
+                    <td class="property" id="stores-rocksdb-max-open-files">stores.<span class="store">store-name</span>.<br>rocksdb.max.open.files</td>
+                    <td class="default">-1</td>
+                    <td class="description">
+                        Limits the number of open LOG files that RocksDB can have open at one time.

Review comment:
       Fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org