You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "patsonluk (via GitHub)" <gi...@apache.org> on 2023/03/20 23:42:28 UTC

[GitHub] [solr] patsonluk opened a new pull request, #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

patsonluk opened a new pull request, #1477:
URL: https://github.com/apache/solr/pull/1477

   https://issues.apache.org/jira/browse/SOLR-16712
   
   # Description
   
   The current implementation of PRS requires an extra param to the DocCollection, the `PrsSupplier`, when `get` is called, would fetch the PRS states from ZK. The implementation of such supplier `LazyPrsSupplier` would only fetch the state on first call. 
   
   While this flow does work properly, this flow might introduce some unnecessary complexity:
   1. PRS entry fetching from ZK is done either during or after the `DocCollection` construction, this could be a bit inconsistent with existing non PRS `DocCollection` design which `DocCollection` is simply a immutable container that does not fetch data after its instantiation
   2. The lazy fetching could introduce some uncertainties as to when exactly the fetching happens (and if any Zookeeper IO exceptions arises)
    
   My guess was that the lazy loading was introduced in https://issues.apache.org/jira/browse/SOLR-16580 as to avoid fetching the PRS states multiple times in the ctor of `DocCollection`, however, if we only fetch the `PerReplicaStates` once on update before calling the `DocCollection` ctor, and pass the `PerReplicaStates` object to the `DocCollection` instead, it can probably achieve similar result but with reduced uncertainty after `DocCollection` construction.
   
   There's another branch which experimented with making DocCollection, Slice and Replica immutable as well for PRS enabled collection [https://github.com/cowpaths/fullstory-solr/pull/84] but is beyond the discussion of this Jira ticket
   
   
   # Solution
   
   Changed `DocCollector` last param from `PrsSupplier` to `PerReplicaStates`, which means in order to create a PRS enabled `DocCollection` , the `PerReplicaStates` object need to be provided upfront instead of getting fetched later on, which used to happen in `DocCollection ctor -> Slice#setPrsSupplier -> Slice#findLeader() -> Replica#isLeader() -> PrsSuppler#get`
   
   In order to minimize the impact, a new builder method `buildDocCollection` is added to `DocCollection` and accept the identical signature as the `DocCollection` before this change (which accepts `PrsSupplier`)
   
   Take note that `DocCollection#copyWith(PerReplicaStates)` has also been renamed to `DocCollection#setReplicaStates(PerReplicaStates)`, this is to better describes that such method simply set the `PerReplicaStates` but does not "copy" (ie creates new `DocCollection` instance).
   
   # Tests
   
   No new test cases are introduced, all the existing unit test cases should provide sufficient coverage
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Reference Guide](https://github.com/apache/solr/tree/main/solr/solr-ref-guide)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "noblepaul (via GitHub)" <gi...@apache.org>.
noblepaul commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1520201259

   This change looks good now. There are no outstanding concerns now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "patsonluk (via GitHub)" <gi...@apache.org>.
patsonluk commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1517068641

   @noblepaul I have pushed the updates, it's using `AtomicReference<PerReplicaStates>` very similar to the `PrsSupplier`, but it does not fetch. The fetching is still done by the `PrsSupplier` but is only called before the `DocCollection` is constructed.
   
   Let me know your thoughts thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "patsonluk (via GitHub)" <gi...@apache.org>.
patsonluk commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1516770497

   > @patsonluk as it is mutating an existing Object, there is a small period during which the state of a given replica is unpredictable . some replicas of the collection would fetch state from old PRS and some will point to new PRS
   > 
   > The question is, what did we achieve with this change?
   
   Thank you for the comments/questions @noblepaul!
   
   Yes you are correct that mutating the prs in  `DocCollection#copyWith(PerReplicaStates) with `this.prsSupplier.prs = newPerReplicaStates` would set the actual prs instance in one place! TBH, i did find such design a bit hard to trace when the PRS state mutates in `Replica` as such state is kept outside of `Replica` and modified in `DocCollection`. 
   
   I guess that is a reasonable trade-off since `DocCollection/Slice/Replica` are mutable for PRS enabled collection and if we want to keep using the same instances for PRS updates, this is likely the best we could do, so let's just keep your existing implementation! 😊 
   
   The goal of this PR is make the loading/fetching of PRS states a bit more deterministic - ie, always fetch before the construction of  `DocCollection` instead of relying on implementation. Currently it's `DocCollection ctor -> Slice#setPrsSupplier -> Slice#findLeader() -> Replica#isLeader() -> PrsSuppler#get`
   
   I will see if we can achieve that with the Supplier still in place for `DocCollection/Slice/Replica`.
   
   Will ping you again once the change is made. Thank you again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "patsonluk (via GitHub)" <gi...@apache.org>.
patsonluk commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1507723145

   ping @noblepaul do u have a chance to take a look? Please lemme know if u have any questions/concerns. thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul merged pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "noblepaul (via GitHub)" <gi...@apache.org>.
noblepaul merged PR #1477:
URL: https://github.com/apache/solr/pull/1477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul commented on a diff in pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "noblepaul (via GitHub)" <gi...@apache.org>.
noblepaul commented on code in PR #1477:
URL: https://github.com/apache/solr/pull/1477#discussion_r1150090351


##########
solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:
##########
@@ -146,6 +143,45 @@ public DocCollection(
     assert name != null && slices != null;
   }
 
+  public static DocCollection buildDocCollection(

Review Comment:
   just `create()` would be a better name?



##########
solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:
##########
@@ -155,13 +191,15 @@ public static String getCollectionPathRoot(String coll) {
   }
 
   /**
-   * Update our state with a state of a {@link Replica} Used to create a new Collection State when
-   * only a replica is updated
+   * Update our state with a state of a {@link PerReplicaStates} which could override states of
+   * {@link Replica}.
+   *
+   * <p>This does not create a new DocCollection.
    */
-  public DocCollection copyWith(PerReplicaStates newPerReplicaStates) {
-    if (this.prsSupplier != null) {
-      log.debug("In-place update of PRS: {}", newPerReplicaStates);
-      this.prsSupplier.prs = newPerReplicaStates;
+  public final DocCollection setPerReplicaStates(PerReplicaStates perReplicaStates) {

Review Comment:
   @patsonluk , can you please address the above ?



##########
solr/solrj/src/java/org/apache/solr/common/cloud/DocCollection.java:
##########
@@ -155,13 +191,15 @@ public static String getCollectionPathRoot(String coll) {
   }
 
   /**
-   * Update our state with a state of a {@link Replica} Used to create a new Collection State when
-   * only a replica is updated
+   * Update our state with a state of a {@link PerReplicaStates} which could override states of
+   * {@link Replica}.
+   *
+   * <p>This does not create a new DocCollection.
    */
-  public DocCollection copyWith(PerReplicaStates newPerReplicaStates) {
-    if (this.prsSupplier != null) {
-      log.debug("In-place update of PRS: {}", newPerReplicaStates);
-      this.prsSupplier.prs = newPerReplicaStates;
+  public final DocCollection setPerReplicaStates(PerReplicaStates perReplicaStates) {

Review Comment:
   as it is mutating an existing Object, there is a small period during which the `state` of a given replica is unpredictable . some replicas of the collection would fetch state from old PRS and some will point to new PRS
   
   The question is, what did we achieve with this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "noblepaul (via GitHub)" <gi...@apache.org>.
noblepaul commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1515636831

   @patsonluk 
   as it is mutating an existing Object, there is a small period during which the state of a given replica is unpredictable . some replicas of the collection would fetch state from old PRS and some will point to new PRS
   
   The question is, what did we achieve with this change?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "noblepaul (via GitHub)" <gi...@apache.org>.
noblepaul commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1485639221

   @patsonluk 
   yes, the latest one doesn't create new Objects, I shall review it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] noblepaul commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "noblepaul (via GitHub)" <gi...@apache.org>.
noblepaul commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1514223539

   > ping @noblepaul do u have a chance to take a look? Please lemme know if u have any questions/concerns. thanks again!
    
   @patsonluk 
   I had posted my comments above


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "patsonluk (via GitHub)" <gi...@apache.org>.
patsonluk commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1515058837

   @noblepaul would u mind to link it again please? i tried the only link i found https://github.com/apache/solr/pull/1477/files/6f682a2febf9e5e4a03b161f61615115d3aea60d and it did not show any comments, perhaps they were not saved? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] patsonluk commented on pull request #1477: SOLR-16712: Simplify DocCollection ctor for PRS enabled collection

Posted by "patsonluk (via GitHub)" <gi...@apache.org>.
patsonluk commented on PR #1477:
URL: https://github.com/apache/solr/pull/1477#issuecomment-1520478105

   > 
   
   Many thanks for the review @noblepaul ! I have added another small commit https://github.com/apache/solr/pull/1477/commits/4752028b0c8a741b3e1a2ba91627225200c3c9b6 that adds javadoc and make the changed DocCollection ctor private - to ensure dev calls `DocCollection#buildDocCollection` in the future. If it's all good, would you mind to merge the changes? Or do we also need to add items to the release notes? 
   
   Many thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org