You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/11 06:37:57 UTC

[GitHub] [druid] AmatyaAvadhanula opened a new pull request, #13070: Allocate numCorePartitions using only used segments

AmatyaAvadhanula opened a new pull request, #13070:
URL: https://github.com/apache/druid/pull/13070

   <!-- Thanks for trying to help us make Apache Druid be the best it can be! Please fill out as much of the following information as is possible (where relevant, and remove it when irrelevant) to help make the intention and scope of this PR clear in order to ease review. -->
   
   <!-- Please read the doc for contribution (https://github.com/apache/druid/blob/master/CONTRIBUTING.md) before making this PR. Also, once you open a PR, please _avoid using force pushes and rebasing_ since these make it difficult for reviewers to see what you've changed in response to their reviews. See [the 'If your pull request shows conflicts with master' section](https://github.com/apache/druid/blob/master/CONTRIBUTING.md#if-your-pull-request-shows-conflicts-with-master) for more details. -->
   
   Fixes potential issues due to uncleaned pending segments which lead to incorrect numCorePartitions during segment allocation.
   
   <!-- Replace XXXX with the id of the issue fixed in this PR. Remove this section if there is no corresponding issue. Don't reference the issue in the title of this pull-request. -->
   
   <!-- If you are a committer, follow the PR action item checklist for committers:
   https://github.com/apache/druid/blob/master/dev/committer-instructions.md#pr-and-issue-action-item-checklist-for-committers. -->
   
   ### Description
   
   <!-- Describe the goal of this PR, what problem are you fixing. If there is a corresponding issue (referenced above), it's not necessary to repeat the description here, however, you may choose to keep one summary sentence. -->
   
   Queries may return no results due to incorrect assignment of corePartitions during segment allocation in the presence of segment allocation.
   
   #### Steps to reproduce:
   
   1. Use streaming ingestion to create a non-empty interval in a datasource.
   2. Suspend supervisor
   3. Compact this datasource without appending to existing data
   4. Resume supervisor and suspend supervisor to add data to this interval (optional)
   5. Drop all segments in this interval
   6. Resume supervisor to ingest more data in this interval
   7. Query data for this interval
   
   
   <!-- Reason for the occurrence of the issue -->
   
   #### Behaviour
   
   Segment allocation creates a shardSpec with (version, shardNumber, numCorePartitions).
   
   For newly created shardSpecs, the segment corresponding to the highest (version, shardNumber) is chosen from the set of both used and pending segments and `numCorePartitions` for newly assigned segments is assigned using the value corresponding to the chosen "max segment".
   
   While version and shardNumber are monotonically increasing with ingestion, numCorePartitions may change with used segments (For example when all core segments are dropped as described above)
   
   When the "max segment" belongs to the pending segment set and core partitions have been dropped, an incorrect (non-zero) value would be considered for numCorePartitions.
   
   <!-- Describe your patch: what did you change in code? How did you fix the problem? -->
   
   #### Patch
   
   The numCorePartitions must be chosen using the "max segment" in the set of used, non-overshadowed segments.
   This can be done by choosing `numCorePartitions` early from the set of used segments in the VersionedIntervalTimeline.
   
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `IndexerSQLMetadataStorageCoordinator`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #13070: Allocate numCorePartitions using only used segments

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #13070:
URL: https://github.com/apache/druid/pull/13070#discussion_r971516930


##########
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java:
##########
@@ -848,6 +848,11 @@ private SegmentIdWithShardSpec createNewSegment(
         versionOfExistingChunk = null;
       }
 
+      // The number of core partitions must always be chosen from the set of used segments in the VersionedIntervalTimeline.
+      // When the core partitions have been dropped, using pending segments may lead to an incorrect state
+      // where the chunk is believed to have core partitions and queries results are incorrect.
+      int numCorePartitions = maxId == null ? 0 : maxId.getShardSpec().getNumCorePartitions();

Review Comment:
   @imply-cheddar , thank you for the review. I've made the changes that you suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 merged pull request #13070: Allocate numCorePartitions using only used segments

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 merged PR #13070:
URL: https://github.com/apache/druid/pull/13070


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] imply-cheddar commented on a diff in pull request #13070: Allocate numCorePartitions using only used segments

Posted by GitBox <gi...@apache.org>.
imply-cheddar commented on code in PR #13070:
URL: https://github.com/apache/druid/pull/13070#discussion_r971469785


##########
server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java:
##########
@@ -848,6 +848,11 @@ private SegmentIdWithShardSpec createNewSegment(
         versionOfExistingChunk = null;
       }
 
+      // The number of core partitions must always be chosen from the set of used segments in the VersionedIntervalTimeline.
+      // When the core partitions have been dropped, using pending segments may lead to an incorrect state
+      // where the chunk is believed to have core partitions and queries results are incorrect.
+      int numCorePartitions = maxId == null ? 0 : maxId.getShardSpec().getNumCorePartitions();

Review Comment:
   The change as written requires an understanding that the `maxId` variable is going to be replaced to understand that this specific position of this line is very important for correctness.  I might recommend that we maintain a `commitedMaxId` and a `overallMaxId` as two separate variables and then use the correct one down at the bottom.  This makes it more clear what the state that we are depending on is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org