You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2016/04/14 02:34:25 UTC
[jira] [Comment Edited] (SOLR-8662) SchemaManager doesn't wait correctly for replicas to update

    [ https://issues.apache.org/jira/browse/SOLR-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240321#comment-15240321 ] 

Steve Rowe edited comment on SOLR-8662 at 4/14/16 12:33 AM:
------------------------------------------------------------

Lots of good changes in your patch, Varun!  I ran all Solr+Solrj+contrib tests and everything passes.

I found a few issues though.

This change is good, but there remains a pre-existing problem:

{code:java|title:SchemaManager.java}
-    int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, -1);
-    long startTime = System.nanoTime();
-    long endTime = timeout > 0 ? System.nanoTime() + (timeout * 1000 * 1000) : Long.MAX_VALUE;
+    //The default timeout is 60s when no BaseSolrResource.UPDATE_TIMEOUT_SECS is specified
+    int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, 60);
+
+    //If BaseSolrResource.UPDATE_TIMEOUT_SECS=0 or -1 then end time is unlimited.
+    long endTime = timeout > 0 ? TimeUnit.NANOSECONDS.convert(timeout, TimeUnit.SECONDS) + System.nanoTime() : Long.MAX_VALUE;
{code}

The problem is that {{System.nanoTime()}} can return *anything* in the range \[{{Long.MIN_VALUE}}, {{Long.MAX_VALUE}}\] and it intentionally overflows, so making {{endTime}} be {{Long.MAX_VALUE}} is effectively choosing a random time in the future; this is *not* the same as "unlimited".  This same situation is dealt with in the {{QueryTimeout}} implementations - used by {{ExitableDirectoryReader}} - by adding a large value to the current value of {{nanoTime()}} to arrive at an effectively unlimited end time.

----

Your changes in {{ZkSolrResourceLoader.openResource()}} introduce the possibility of retrying 10 times to fetch a resource from ZK when it's not there (according to {{zkController.pathExists(file)}}) - I think in this case the retry loop should be immediately exited, so that the classpath search can start immediately.  (I'm assuming that the intent here is *not* to wait around for another thread/process to finish uploading a resource - if that's the case then the branch for when the path doesn't exist should have a timeout, which it doesn't in your patch.)

----

bq. Without the changes to SchemaManager the test would fail.

Indeed, when I revert the SchemaManager changes, {{TestManagedSchemaAPI.testAddFieldAndDocument()}} fails.  But {{testReloadAndAddSimple()}} always succeeds, regardless of the SchemaManager changes, so I'm not sure why it's there, since it doesn't have anything to do with new field visibility - can it be removed?


was (Author: steve_rowe):
Lots of good changes in your patch, Varun!  I ran all Solr+Solrj+contrib tests and everything passes.

I found a few issues though.

This change is good, but there remains a pre-existing problem:

{code:java|title:SchemaManager.java}
-    int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, -1);
-    long startTime = System.nanoTime();
-    long endTime = timeout > 0 ? System.nanoTime() + (timeout * 1000 * 1000) : Long.MAX_VALUE;
+    //The default timeout is 60s when no BaseSolrResource.UPDATE_TIMEOUT_SECS is specified
+    int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, 60);
+
+    //If BaseSolrResource.UPDATE_TIMEOUT_SECS=0 or -1 then end time is unlimited.
+    long endTime = timeout > 0 ? TimeUnit.NANOSECONDS.convert(timeout, TimeUnit.SECONDS) + System.nanoTime() : Long.MAX_VALUE;
{code}

The problem is that {{System.nanoTime()}} can return *anything* in the range \[{{Long.MIN_VALUE}}, {{Long.MAX_VALUE}}\] and it intentionally overflows, so making {{endTime}} be {{Long.MAX_VALUE}} is effectively choosing a random time in the future; this is *not* the same as "unlimited".  This same situation is dealt with in the {{QueryTimeout}} implementations - used by {{ExitableDirectoryReader}} - by adding a large value to the current value of {{nanoTime()}} to arrive at an effectively unlimited end time.

----

Your changes in {{ZkSolrResourceLoader.openResource()}} introduce the possibility of retrying 10 times to fetch a resource from ZK when it's not there (according to {{zkController.pathExists(file)}) - I think in this case the retry loop should be immediately exited, so that the classpath search can start immediately.  (I'm assuming that the intent here is *not* to wait around for another thread/process to finish uploading a resource - if that's the case then the branch for when the path doesn't exist should have a timeout, which it doesn't in your patch.)

----

bq. Without the changes to SchemaManager the test would fail.

Indeed, when I revert the SchemaManager changes, {{TestManagedSchemaAPI.testAddFieldAndDocument()}} fails.  But {{testReloadAndAddSimple()}} always succeeds, regardless of the SchemaManager changes, so I'm not sure why it's there, since it doesn't have anything to do with new field visibility - can it be removed?

> SchemaManager doesn't wait correctly for replicas to update
> -----------------------------------------------------------
>
>                 Key: SOLR-8662
>                 URL: https://issues.apache.org/jira/browse/SOLR-8662
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Varun Thacker
>            Assignee: Varun Thacker
>         Attachments: SOLR-8662.patch
>
>
> Currently in SchemaManager, waitForOtherReplicasToUpdate doesn't execute since it gets passed the timeout value instead of the end time 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org