You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/26 20:35:14 UTC

[GitHub] [druid] wjhypo opened a new pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

wjhypo opened a new pull request #11309:
URL: https://github.com/apache/druid/pull/11309


   
   
   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   <!-- Please read the doc for contribution (https://github.com/apache/druid/blob/master/CONTRIBUTING.md) before making this PR. Also, once you open a PR, please _avoid using force pushes and rebasing_ since these make it difficult for reviewers to see what you've changed in response to their reviews. See [the 'If your pull request shows conflicts with master' section](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master) for more details. -->
   
   Fixes #7374
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   
   <!-- Describe the goal of this PR, what problem are you fixing. If there is a corresponding issue (referenced above), it's not necessary to repeat the description here, however, you may choose to keep one summary sentence. -->
   
   <!-- Describe your patch: what did you change in code? How did you fix the problem? -->
   
   <!-- If there are several relatively logically separate changes in this PR, create a mini-section for each of them. For example: -->
   
   <!--
   #### Fixed the bug ...
   #### Renamed the class ...
   #### Added a forbidden-apis entry ...
   
   
   In each section, please describe design decisions made, including:
   -->
    - Choice of algorithms
   
   This PR avoids the more complicated design from #7374 to address the use case of a cluster setup with the total segment size on local disk smaller than the page cache usable in the RAM by adopting a simpler design to read the index files of all segments for all data sources in a host blindly into null output stream to force OS to load the segment files into page cache during historical process bootstrap and on new segment download after bootstrap. Other setups with a smaller RAM to disk ratio are left to undefined behavior (won't crash but just not the best performance as the intention of the PR).
   
   When the total segment size on local disk is larger than the page cache usable in the RAM, a lot of complications will happen and many of them are out of application's control so this PR tries to avoid this use case:
   1) Page cache management is at the discretion of operating system so the OS may evict any existing segments already in page cache based on whatever policy it holds (LRU, LFU, etc.). From the application perspective, we can only assume any segment may be evicted when the page cache is full.
   2) If we make it configurable to load only certain data sources into page cache, let's say the host is hosting data source X and Y and only X is configured to load into page cache, unless data source Y is completely not queried, we can't prevent Y from polluting the page cache used by X because when Y is queried, the portion of the bytes needed to complete the query in the segment index file will be mmaped and has to be loaded into page cache by OS and some segments of X in the page cache will have to be evicted first if the page cache is full. A better design to address different data source's need in my opinion is to configure different server pools for X and Y where for a latency sensitive data source X, put it in a cluster of 1 to 1 mapping of available page cache RAM to segment size and enable this feature; for a non latency sensitive data source Y, put it in another cluster of historical nodes of smaller page cache RAM to disk ratio and disable this feature so on demand page
  in and page out during mmap is performed. Although the user can still choose to enable this feature for Y but it just won't be as effective.
   
   
   This PR trades availability off performance, so the whole loading segments into page cache is done in a thread pool asynchronously to not delay historical process startup and to not impact data readiness for query ETA when available for download.
   
   
   
   <!-- 
   
    - Behavioral aspects. What configuration values are acceptable? How are corner cases and error conditions handled, such as when there are insufficient resources?
   
   
   
    - Class organization and design (how the logic is split between classes, inheritance, composition, design patterns)
    - Method organization and design (how the logic is split between methods, parameters and return types)
    - Naming (class, method, API, configuration, HTTP endpoint, names of emitted metrics)
   
   
   
   It's good to describe an alternative design (or mention an alternative name) for every design (or naming) decision point and compare the alternatives with the designs that you've implemented (or the names you've chosen) to highlight the advantages of the chosen designs and names. -->
   
   <!-- If there was a discussion of the design of the feature implemented in this PR elsewhere (e. g. a "Proposal" issue, any other issue, or a thread in the development mailing list), link to that discussion from this PR description and explain what have changed in your final design compared to your original proposal or the consensus version in the end of the discussion. If something hasn't changed since the original discussion, you can omit a detailed discussion of those aspects of the design here, perhaps apart from brief mentioning for the sake of readability of this PR description. -->
   
   <!-- Some of the aspects mentioned above may be omitted for simple and small changes. 
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `MyFoo`
    * `OurBar`
    * `TheirBaz`
   
   <hr>
   
   -->
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-955133046


   This pull request **introduces 1 alert** when merging 3e53b21d5c1d16473953ee9496b773516842db60 into 33d9d9bd74ade384ef5feb31748b989122deb160 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-de390e40df7225f3197642f74e081aee3ad68287)
   
   **new alerts:**
   
   * 1 for Uncontrolled data used in path expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r832557071



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
##########
@@ -436,6 +453,58 @@ public void cleanup(DataSegment segment)
     }
   }
 
+  @Override
+  public void loadSegmentIntoPageCache(DataSegment segment, ExecutorService exec)
+  {
+    ExecutorService execToUse = exec != null ? exec : loadSegmentsIntoPageCacheOnDownloadExec;
+    if (execToUse == null) {
+      return;
+    }
+
+    execToUse.submit(
+        () -> {
+          final ReferenceCountingLock lock = createOrGetLock(segment);

Review comment:
       I have fixed some of the docs related spelling check failures




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r783866038



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
##########
@@ -95,6 +104,14 @@ public SegmentLocalCacheManager(
     this.locations = locations;
     this.strategy = strategy;
     log.info("Using storage location strategy: [%s]", this.strategy.getClass().getSimpleName());
+
+    if (this.config.getNumThreadsToLoadSegmentsIntoPageCacheOnDownload() != 0) {

Review comment:
       nit: Should add validation for negative values




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r784705966



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
##########
@@ -95,6 +104,14 @@ public SegmentLocalCacheManager(
     this.locations = locations;
     this.strategy = strategy;
     log.info("Using storage location strategy: [%s]", this.strategy.getClass().getSimpleName());
+
+    if (this.config.getNumThreadsToLoadSegmentsIntoPageCacheOnDownload() != 0) {

Review comment:
       Without validation historical start up will fail with IllegalArgumentException, users will have to figure out the cause for it which is also fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r739540220



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Hi Parag, thanks for the comment! To clarify, the point of the PR is to provide best effort precache of segments but not 100% guaranteed precache of segments. The reason is we don't want to sacrifice availability.
   
   Image we have a data source with one replica configured and the host serving it dies, then we want the missing segments to be available ASAP on other replacement host. This is the case of loading segments into a complete new host with its page cache empty of the segments to load. Copying segments to null stream can take some time and if you have a lot segments, it can easily take more than 10 mins to complete the full read into page cache which is too long period of unavailability. In production, we tried the synchronous loading before announcing the segment, and it was indeed too slow. Another case is if we just restart the Druid historical process on the same host, then the process of reading segments into null stream is pretty fast as the segments are already cached in page cache by OS but it will still take some time compared with announcing segments directly after download.
   
   That said, the strategy of announce segment immediately after download and asynchronously read into null stream afterwards still have value: without the change, since OS will only mmap the portion of segments a query needs, even after days of serving a segment in a historical, a different query hitting a different portion of segment will still need to trigger disk reads. With the change, we can ensure after the first 10 mins or so after downloading the segments into a historical (depending on the number of segments), all subsequent queries won't hit disk.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r741991578



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Sg, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r736200323



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Since loading segment is synchronous where as loading into page cache is async then it might happen that historical has announced itself but `loadSegmentsIntoPageCacheOnBootstrapExec` is still copying segments to null stream which can take time depending on IO throughput of the disk. I thought the point of the PR is to warm up the page cache before historical announces itself ready and hence no warm up delays and performance issues after reboot. Am I missing something here ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r754587573



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       @pjain1 Hi Parag, I hope I can get something by end of next week




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-1075436084


   icy small does not make sense and should not be ideally used but adding for completeness of helm charts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r753755085



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       @wjhypo any updates on when can you make the changes ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-1075348473


   This pull request **introduces 1 alert** when merging cfbce443cc6584161086bd9ddd1107b347f49c18 into 0867ca75e12f20934c92a8a3ed02bad3e740684f - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-7f176233dd4471605af787e55f5038f7c3ec69bd)
   
   **new alerts:**
   
   * 1 for Uncontrolled data used in path expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r832556435



##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
##########
@@ -436,6 +453,58 @@ public void cleanup(DataSegment segment)
     }
   }
 
+  @Override
+  public void loadSegmentIntoPageCache(DataSegment segment, ExecutorService exec)
+  {
+    ExecutorService execToUse = exec != null ? exec : loadSegmentsIntoPageCacheOnDownloadExec;
+    if (execToUse == null) {
+      return;
+    }
+
+    execToUse.submit(
+        () -> {
+          final ReferenceCountingLock lock = createOrGetLock(segment);

Review comment:
       @wjhypo can you add test for this runnable, travis is failing because of code coverage issue. Once travis passes will merge this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r762949831



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       gentle reminder




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r741991578



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Sg, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r739854428



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       I understand your point I am just worried that when a historical is warming up the query performance at that time will be worse than performance after usual boot up as its already doing disk reads for warming up and doing more disk reads for serving query. 
   
   Can we introduce a flag to enable synchronous warm up which when enabled historical only announces itself after everything is read once. When warm up is enabled default behaviour will be async as in this PR but can be changed to sync warm up when boot up time is not a concern. How does this sound to you ? Thanks
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r783872090



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -351,9 +359,19 @@ each time when addSegment() is called, it has to wait for the lock in order to m
     }
   }
 
+  /**
+   * Bulk adding segments during bootstrap
+   * @param segments A collection of segments to add
+   * @param callback Segment loading callback
+   */
   private void addSegments(Collection<DataSegment> segments, final DataSegmentChangeCallback callback)
   {
+    // Start a temporary thread pool to load segments into page cache during bootstrap
     ExecutorService loadingExecutor = null;
+    ExecutorService loadSegmentsIntoPageCacheOnBootstrapExec =
+        config.getNumThreadsToLoadSegmentsIntoPageCacheOnBootstrap() != 0 ?

Review comment:
       Should add validation for negative values 

##########
File path: server/src/main/java/org/apache/druid/segment/loading/SegmentLocalCacheManager.java
##########
@@ -95,6 +104,14 @@ public SegmentLocalCacheManager(
     this.locations = locations;
     this.strategy = strategy;
     log.info("Using storage location strategy: [%s]", this.strategy.getClass().getSimpleName());
+
+    if (this.config.getNumThreadsToLoadSegmentsIntoPageCacheOnDownload() != 0) {

Review comment:
       Should add validation for negative values




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r754593299



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-1069219674


   This pull request **introduces 1 alert** when merging c8ebd7eba594a75cf27cf48e0c1948eb9438a6ab into 69f928f50e849b7c97e7cd1958d9c2505acf070f - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-207ddc041397d96facc5c0914601034d1f161f47)
   
   **new alerts:**
   
   * 1 for Uncontrolled data used in path expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-955125457


   > @wjhypo can you fix the conflicts and review the comments, thanks
   
   Addressed! Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-954698804


   @wjhypo can you fix the conflicts and review the comments, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r783872090



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -351,9 +359,19 @@ each time when addSegment() is called, it has to wait for the lock in order to m
     }
   }
 
+  /**
+   * Bulk adding segments during bootstrap
+   * @param segments A collection of segments to add
+   * @param callback Segment loading callback
+   */
   private void addSegments(Collection<DataSegment> segments, final DataSegmentChangeCallback callback)
   {
+    // Start a temporary thread pool to load segments into page cache during bootstrap
     ExecutorService loadingExecutor = null;
+    ExecutorService loadSegmentsIntoPageCacheOnBootstrapExec =
+        config.getNumThreadsToLoadSegmentsIntoPageCacheOnBootstrap() != 0 ?

Review comment:
       nit: Should add validation for negative values 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r794482490



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       @wjhypo any thoughts on this ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 removed a comment on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 removed a comment on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-1075436084


   icy small does not make sense and should not be ideally used but adding for completeness of helm charts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #11309:
URL: https://github.com/apache/druid/pull/11309#issuecomment-1075622353


   This pull request **introduces 1 alert** when merging 36ddd8b2c15a5532400de8d7ba32a897d24b2f24 into d7308e929076b2170ba34fd1b731aa0309e09e90 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-6fed01c70eda5c693df06d790e252471a9fac4b0)
   
   **new alerts:**
   
   * 1 for Uncontrolled data used in path expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r753755085



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       @wjhypo any updates on when can you make the changes ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r741991578



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Sg, thanks!

##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       Sg, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] wjhypo commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
wjhypo commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r764462889



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       @pjain1 Hi Parag, sorry about the delay. Sorry I still didn't get time to make the change and verify the sync approach in a test cluster. Not sure if any user is in need of the sync approach. We only verified the async approach in production. At this point, it would be better if someone in need of the sync approach can help make the change and dedicate some time to verify the change. It currently doesn't align with our priority at work so I won't be able to make the change this year.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] pjain1 commented on a change in pull request #11309: Optionally load segment index files into page cache on bootstrap and new segment download

Posted by GitBox <gi...@apache.org>.
pjain1 commented on a change in pull request #11309:
URL: https://github.com/apache/druid/pull/11309#discussion_r784710351



##########
File path: server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
##########
@@ -369,7 +392,7 @@ private void addSegments(Collection<DataSegment> segments, final DataSegmentChan
                     numSegments,
                     segment.getId()
                 );
-                loadSegment(segment, callback, config.isLazyLoadOnStart());
+                loadSegment(segment, callback, config.isLazyLoadOnStart(), loadSegmentsIntoPageCacheOnBootstrapExec);

Review comment:
       @wjhypo no issues, I think we can get it in as in current condition. Can work on the sync behaviour later.
   
   Few questions - Do you run this on your prod env ? Hows the performance after starting of the historical process and before warm up is complete as process will be doing double reads for warming up as well as serving query ? Do you see more of timeouts during this time ? Is it even usable at that time ? Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org