You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/03/23 23:36:08 UTC

[GitHub] [accumulo] arvindshmicrosoft opened a new pull request #1569: Use SimpleThreadPool for importtable file renames

arvindshmicrosoft opened a new pull request #1569: Use SimpleThreadPool for importtable file renames
URL: https://github.com/apache/accumulo/pull/1569
 
 
   Improves performance as calls to fs.rename are now made in parallel
   from multiple threads within a SimpleThreadPool. Thread pool size is
   controlled by new property master.importtable.rename.threadpool.size

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] keith-turner commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397901620
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -248,9 +248,16 @@
       "The number of threads to use when coordinating a bulk import."),
   MASTER_BULK_TIMEOUT("master.bulk.timeout", "5m", PropertyType.TIMEDURATION,
       "The time to wait for a tablet server to process a bulk import request"),
+  MASTER_RENAME_THREADS("master.rename.threadpool.size", "20", PropertyType.COUNT,
+      "The number of threads to use when renaming user files during table import or bulk ingest."),
+  @Deprecated
+  @ReplacedBy(property = MASTER_RENAME_THREADS)
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
 
 Review comment:
   Can this be removed?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397619369
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   The changes are implemented. Please take a look. I was also curious about what you meant by "other FATE operations that perform renames". I think bulk ingest (importdirectory) is one, which is addressed in this PR now, but would like to know where else (of course, I'll leave those changes to a later change as you suggested).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397293188
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   Sounds good.  After you finish this PR, I can create a follow on ticket to utilize the new property across the other FATE operations that perform renames.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397919457
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -248,9 +248,16 @@
       "The number of threads to use when coordinating a bulk import."),
   MASTER_BULK_TIMEOUT("master.bulk.timeout", "5m", PropertyType.TIMEDURATION,
       "The time to wait for a tablet server to process a bulk import request"),
+  MASTER_RENAME_THREADS("master.rename.threadpool.size", "20", PropertyType.COUNT,
+      "The number of threads to use when renaming user files during table import or bulk ingest."),
+  @Deprecated
+  @ReplacedBy(property = MASTER_RENAME_THREADS)
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
 
 Review comment:
   So sorry, I could have sworn I took care of that. Will address. Sorry again!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime merged pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
milleruntime merged pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] keith-turner commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397903548
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -248,9 +248,16 @@
       "The number of threads to use when coordinating a bulk import."),
   MASTER_BULK_TIMEOUT("master.bulk.timeout", "5m", PropertyType.TIMEDURATION,
       "The time to wait for a tablet server to process a bulk import request"),
+  MASTER_RENAME_THREADS("master.rename.threadpool.size", "20", PropertyType.COUNT,
+      "The number of threads to use when renaming user files during table import or bulk ingest."),
+  @Deprecated
+  @ReplacedBy(property = MASTER_RENAME_THREADS)
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
 
 Review comment:
   Some of the deprecated properties have the following in their description.  I think this is nice for the generated documentation.
   
   ```suggestion
         "This property is deprecated since 2.1.0. The number of threads to use when moving user files to bulk ingest "
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397204780
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   I was thinking deprecate ```master.bulk.rename.threadpool.size``` and create something more generic like ```master.rename.threadpool.size``` that could be used anywhere in the Master where we want to optimize renames.  Just doing a quick search for "fs.rename" the new and old Bulk imports could use this property as well.  My point is I don't think it need to have a property for each fate operation.  If a user is having issues with NN renames, they would benefit in the other operations as well. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397259497
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   Can I proceed to revise this PR with the following:
   1. deprecate master.bulk.rename.threadpool.size and recommend master.rename.threadpool.size as the new property
   2. update the importdirectory (bulk) code with the appropriate code to resolve the new property and warn about deprecation
   3. update the changed importtable code to leverage the new property
   
   Please let me know.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] milleruntime commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397131895
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   I wonder if we can combine this with ```MASTER_BULK_RENAME_THREADS``` and just have one common value for renames in the Master.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397989734
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -248,9 +248,16 @@
       "The number of threads to use when coordinating a bulk import."),
   MASTER_BULK_TIMEOUT("master.bulk.timeout", "5m", PropertyType.TIMEDURATION,
       "The time to wait for a tablet server to process a bulk import request"),
+  MASTER_RENAME_THREADS("master.rename.threadpool.size", "20", PropertyType.COUNT,
+      "The number of threads to use when renaming user files during table import or bulk ingest."),
+  @Deprecated
+  @ReplacedBy(property = MASTER_RENAME_THREADS)
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
 
 Review comment:
   Done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397259497
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   Can I proceed to revise this PR with the following:
   1. deprecate master.bulk.rename.threadpool.size
   2. recommend master.rename.threadpool.size as the new property
   3. update the changed importtable code to leverage the new property
   
   Please let me know.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] keith-turner commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397182330
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   Could deprecate the bulk.rename prop and have a new `master.import.rename.threadpool.size`.   Could use the AccumuloConfiguration.resolve() method in the bulk import code to get the correct property. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397993887
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -248,9 +248,16 @@
       "The number of threads to use when coordinating a bulk import."),
   MASTER_BULK_TIMEOUT("master.bulk.timeout", "5m", PropertyType.TIMEDURATION,
       "The time to wait for a tablet server to process a bulk import request"),
+  MASTER_RENAME_THREADS("master.rename.threadpool.size", "20", PropertyType.COUNT,
+      "The number of threads to use when renaming user files during table import or bulk ingest."),
+  @Deprecated
+  @ReplacedBy(property = MASTER_RENAME_THREADS)
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
 
 Review comment:
   Done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on issue #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on issue #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#issuecomment-604552940
 
 
   Thanks @milleruntime. In a single-node Uno test the duration of an importtable of 12,800 RFiles went from around 80 seconds to a few seconds (< 5 seconds for sure) with the default 20 threads. In my test, each RFile had just 1 entry but IMHO that should not affect the outcome.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [accumulo] arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable

Posted by GitBox <gi...@apache.org>.
arvindshmicrosoft commented on a change in pull request #1569: Fixes #1464 - SimpleThreadPool for importtable
URL: https://github.com/apache/accumulo/pull/1569#discussion_r397225033
 
 

 ##########
 File path: core/src/main/java/org/apache/accumulo/core/conf/Property.java
 ##########
 @@ -251,6 +251,9 @@
   MASTER_BULK_RENAME_THREADS("master.bulk.rename.threadpool.size", "20", PropertyType.COUNT,
       "The number of threads to use when moving user files to bulk ingest "
           + "directories under accumulo control"),
+  MASTER_IMPORTTABLE_RENAME_THREADS("master.importtable.rename.threadpool.size", "20",
+      PropertyType.COUNT,
+      "The number of threads to use when renaming user files when importing a table."),
 
 Review comment:
   Thank you Mike and Keith - I'll lean on your experience to guide me accordingly. I conceptually agree having one property would be better.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services