You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/05 14:04:59 UTC

[GitHub] [druid] yohemyang opened a new pull request #11878: Load cache files in parallel to speed up historical startup

yohemyang opened a new pull request #11878:
URL: https://github.com/apache/druid/pull/11878


   <hr>
   
   ##### Key changed/added classes in this PR
    * `SegmentLoadDropHandler`
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11878: Load cache files in parallel to speed up historical startup

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11878:
URL: https://github.com/apache/druid/pull/11878#discussion_r743751178



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java
##########
@@ -56,6 +56,9 @@
   @JsonProperty("numBootstrapThreads")
   private Integer numBootstrapThreads = null;
 
+  @JsonProperty("numCacheLoadThreads")
+  private int numCacheLoadThreads = 5;

Review comment:
       What is the purpose to make it a configuration item ? I think if we can make it default to the number of CPU cores? Or can we re-use `numBootstrapThreads` because it is defined as:
   > How many segments to load concurrently during historical startup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #11878: Load cache files in parallel to speed up historical startup

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #11878:
URL: https://github.com/apache/druid/pull/11878#issuecomment-961977309


   Thanks for your contribution and performance report.  
   
   128688 segments are not very large, and the cache loading is not very complicated, it reads file and then deserialize to `DataSegment` and then check the segment if is intact, I don't understand where is the bottle neck and why it takes 42 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11878: Load cache files in parallel to speed up historical startup

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11878:
URL: https://github.com/apache/druid/pull/11878#discussion_r743751178



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java
##########
@@ -56,6 +56,9 @@
   @JsonProperty("numBootstrapThreads")
   private Integer numBootstrapThreads = null;
 
+  @JsonProperty("numCacheLoadThreads")
+  private int numCacheLoadThreads = 5;

Review comment:
       What is the purpose to make it a configuration item ? I think if we can make it default to the number of CPU cores? Or can we re-use `numBootstrapThreads` because it is defined as:
   > How many segments to load concurrently during historical startup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #11878: Load cache files in parallel to speed up historical startup

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #11878:
URL: https://github.com/apache/druid/pull/11878#issuecomment-961977309


   Thanks for your contribution and performance report.  
   
   128688 segments are not very large, and the cache loading is not very complicated, it reads file and then deserialize to `DataSegment` and then check the segment if is intact, I don't understand where is the bottle neck and why it takes 42 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #11878: Load cache files in parallel to speed up historical startup

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #11878:
URL: https://github.com/apache/druid/pull/11878#discussion_r743751178



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLoaderConfig.java
##########
@@ -56,6 +56,9 @@
   @JsonProperty("numBootstrapThreads")
   private Integer numBootstrapThreads = null;
 
+  @JsonProperty("numCacheLoadThreads")
+  private int numCacheLoadThreads = 5;

Review comment:
       What is the purpose to make it a configuration item ? I think if we can make it default to the number of CPU cores? Or can we re-use `numBootstrapThreads` because it is defined as:
   > How many segments to load concurrently during historical startup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #11878: Load cache files in parallel to speed up historical startup

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #11878:
URL: https://github.com/apache/druid/pull/11878#issuecomment-961977309


   Thanks for your contribution and performance report.  
   
   128688 segments are not very large, and the cache loading is not very complicated, it reads file and then deserialize to `DataSegment` and then check the segment if is intact, I don't understand where is the bottle neck and why it takes 42 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org