You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/26 21:09:23 UTC

[GitHub] [lucene] stevenschlansker opened a new pull request, #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

stevenschlansker opened a new pull request, #11822:
URL: https://github.com/apache/lucene/pull/11822

   Adds a configurable timeout for PrimaryNode waiting for remotes to close.
   The default matches existing behavior. In the long run, maybe this wait loop goes away entirely, but I elected to make the most compatible change first.
   
   Fixes #11674
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
zhaih commented on code in PR #11822:
URL: https://github.com/apache/lucene/pull/11822#discussion_r998806414


##########
lucene/CHANGES.txt:
##########
@@ -44,6 +44,8 @@ New Features
 * LUCENE-10626 Hunspell: add tools to aid dictionary editing:
   analysis introspection, stem expansion and stem/flag suggestion (Peter Gromov)
 
+* GITHUB#11822: Configure replicator PrimaryNode replia shutdown timeout. (Steven Schlansker)

Review Comment:
   We can put it under 9.5 so it can be release in next release and don't need to wait until 10?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on PR #11822:
URL: https://github.com/apache/lucene/pull/11822#issuecomment-1283082251

   I updated this PR to rename the field to include `Ms`.
   I added a test case for both no timeout (0), and 1000ms. I verified the test fails (doesn't terminate) without the new configuration option sent.
   
   I don't think this is a great candidate for a random test - randomization is wonderful for perturbing data and finding edge cases in algorithms. In this case, it is just an int we compare to a clock, so randomizing doesn't seem likely to uncover any helpful edge cases. Please let me know if this is still desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
mikemccand commented on code in PR #11822:
URL: https://github.com/apache/lucene/pull/11822#discussion_r983427086


##########
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##########
@@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() {
     throw new AssertionError("missing VERSION_KEY");
   }
 
+  /**
+   * @return the number of milliseconds to wait during shutdown for remote replicas to close
+   */
+  public int getRemoteCloseTimeout() {
+    return remoteCloseTimeout;
+  }
+
+  /**
+   * Set the number of milliseconds to wait during shutdown for remote replicas to close. {@code -1}
+   * (the default) means forever, and {@code 0} means don't wait at all.
+   */
+  public void setRemoteCloseTimeout(int remoteCloseTimeout) {
+    this.remoteCloseTimeout = remoteCloseTimeout;

Review Comment:
   Could we throw `IllegalArgumentException` if this is < -1?



##########
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##########
@@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() {
     throw new AssertionError("missing VERSION_KEY");
   }
 
+  /**
+   * @return the number of milliseconds to wait during shutdown for remote replicas to close
+   */
+  public int getRemoteCloseTimeout() {
+    return remoteCloseTimeout;
+  }
+
+  /**
+   * Set the number of milliseconds to wait during shutdown for remote replicas to close. {@code -1}
+   * (the default) means forever, and {@code 0} means don't wait at all.
+   */
+  public void setRemoteCloseTimeout(int remoteCloseTimeout) {

Review Comment:
   In addition to a specific test case, we could also randomize the timeout we pass in our randomized tests.  Randomly sometimes pass 0, other times a random range or so.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih merged pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
zhaih merged PR #11822:
URL: https://github.com/apache/lucene/pull/11822


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
zhaih commented on PR #11822:
URL: https://github.com/apache/lucene/pull/11822#issuecomment-1283191722

   I merged it but seems there're test failure
   ```
   org.apache.lucene.index.TestIndexFileDeleter > test suite's output saved to /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestIndexFileDeleter.txt, copied below:
      >     org.apache.lucene.store.AlreadyClosedException: ReaderPool is already closed
      >         at __randomizedtesting.SeedInfo.seed([FF62209E9305A732:16FF57ACE5CC40CF]:0)
      >         at app//org.apache.lucene.index.ReaderPool.get(ReaderPool.java:400)
      >         at app//org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3922)
      >         at app//org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:592)
      >         at app//org.apache.lucene.index.IndexWriter$4.getReader(IndexWriter.java:6479)
      >         at app//org.apache.lucene.tests.index.RandomIndexWriter.getReader(RandomIndexWriter.java:488)
      >         at app//org.apache.lucene.tests.index.RandomIndexWriter.getReader(RandomIndexWriter.java:420)
      >         at app//org.apache.lucene.index.TestIndexFileDeleter.testExcInDecRef(TestIndexFileDeleter.java:485)
      >         at java.base@17.0.4.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      >         at java.base@17.0.4.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
      >         at java.base@17.0.4.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      >         at java.base@17.0.4.1/java.lang.reflect.Method.invoke(Method.java:568)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
      >         at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
      >         at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
      >         at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
      >         at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
      >         at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
      >         at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
      >         at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      >         at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
      >         at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
      >         at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
      >         at app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
      >         at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
      >         at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      >         at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
      >         at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
      >         at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
      >         at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      >         at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      >         at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
      >         at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
      >         at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
      >         at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
      >         at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
      >         at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
      >         at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
      >         at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
      >         at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
      >         at java.base@17.0.4.1/java.lang.Thread.run(Thread.java:833)
     2> NOTE: reproduce with: gradlew test --tests TestIndexFileDeleter.testExcInDecRef -Dtests.seed=FF62209E9305A732 -Dtests.locale=is-IS -Dtests.timezone=Africa/Maseru -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   ```
   
   But seems not related to this PR, I'll create another issue if that failure is reproducible.
   Thanks @stevenschlansker !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on PR #11822:
URL: https://github.com/apache/lucene/pull/11822#issuecomment-1284708755

   OK - I did run `./gradlew check` so I don't think I broke anything, but please let me know if it does end up being related!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
zhaih commented on PR #11822:
URL: https://github.com/apache/lucene/pull/11822#issuecomment-1284755158

   I tried to reproduce the issue but couldn't. So likely a transient or
   extremely rare test failure, and should not be related to the PR
   
   On Wed, Oct 19, 2022, 16:44 Steven Schlansker ***@***.***>
   wrote:
   
   > OK - I did run ./gradlew check so I don't think I broke anything, but
   > please let me know if it does end up being related!
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/lucene/pull/11822#issuecomment-1284708755>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AEFSB7EMFIIHD5IKGTMNEKLWECBXJANCNFSM6AAAAAAQWFSERU>
   > .
   > You are receiving this because you modified the open/close state.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zhaih commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
zhaih commented on code in PR #11822:
URL: https://github.com/apache/lucene/pull/11822#discussion_r980742864


##########
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##########
@@ -61,6 +61,7 @@ public abstract class PrimaryNode extends Node {
   private CopyState copyState;
 
   protected final long primaryGen;
+  private int remoteCloseTimeout = -1;

Review Comment:
   It'd better to name it `remoteCloseTimeoutMs`? And also change the getter and setter correspondingly?



##########
lucene/replicator/src/java/org/apache/lucene/replicator/nrt/PrimaryNode.java:
##########
@@ -196,6 +197,21 @@ public synchronized long getLastCommitVersion() {
     throw new AssertionError("missing VERSION_KEY");
   }
 
+  /**
+   * @return the number of milliseconds to wait during shutdown for remote replicas to close
+   */
+  public int getRemoteCloseTimeout() {
+    return remoteCloseTimeout;
+  }
+
+  /**
+   * Set the number of milliseconds to wait during shutdown for remote replicas to close. {@code -1}
+   * (the default) means forever, and {@code 0} means don't wait at all.
+   */
+  public void setRemoteCloseTimeout(int remoteCloseTimeout) {

Review Comment:
   Can we add some unit tests for at least `remoteCloseTimeout = 0` case? Like produce some error or just interrupt the replicaNode and then close the primaryNode?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] stevenschlansker commented on a diff in pull request #11822: PrimaryNode: add configurable timeout to waitForAllRemotesToClose

Posted by GitBox <gi...@apache.org>.
stevenschlansker commented on code in PR #11822:
URL: https://github.com/apache/lucene/pull/11822#discussion_r998813121


##########
lucene/CHANGES.txt:
##########
@@ -44,6 +44,8 @@ New Features
 * LUCENE-10626 Hunspell: add tools to aid dictionary editing:
   analysis introspection, stem expansion and stem/flag suggestion (Peter Gromov)
 
+* GITHUB#11822: Configure replicator PrimaryNode replia shutdown timeout. (Steven Schlansker)

Review Comment:
   Sounds great to me! I moved the CHANGES entry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org