You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2016/04/14 02:34:25 UTC
[jira] [Comment Edited] (SOLR-8662) SchemaManager doesn't wait
correctly for replicas to update
[ https://issues.apache.org/jira/browse/SOLR-8662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240321#comment-15240321 ]
Steve Rowe edited comment on SOLR-8662 at 4/14/16 12:33 AM:
------------------------------------------------------------
Lots of good changes in your patch, Varun! I ran all Solr+Solrj+contrib tests and everything passes.
I found a few issues though.
This change is good, but there remains a pre-existing problem:
{code:java|title:SchemaManager.java}
- int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, -1);
- long startTime = System.nanoTime();
- long endTime = timeout > 0 ? System.nanoTime() + (timeout * 1000 * 1000) : Long.MAX_VALUE;
+ //The default timeout is 60s when no BaseSolrResource.UPDATE_TIMEOUT_SECS is specified
+ int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, 60);
+
+ //If BaseSolrResource.UPDATE_TIMEOUT_SECS=0 or -1 then end time is unlimited.
+ long endTime = timeout > 0 ? TimeUnit.NANOSECONDS.convert(timeout, TimeUnit.SECONDS) + System.nanoTime() : Long.MAX_VALUE;
{code}
The problem is that {{System.nanoTime()}} can return *anything* in the range \[{{Long.MIN_VALUE}}, {{Long.MAX_VALUE}}\] and it intentionally overflows, so making {{endTime}} be {{Long.MAX_VALUE}} is effectively choosing a random time in the future; this is *not* the same as "unlimited". This same situation is dealt with in the {{QueryTimeout}} implementations - used by {{ExitableDirectoryReader}} - by adding a large value to the current value of {{nanoTime()}} to arrive at an effectively unlimited end time.
----
Your changes in {{ZkSolrResourceLoader.openResource()}} introduce the possibility of retrying 10 times to fetch a resource from ZK when it's not there (according to {{zkController.pathExists(file)}}) - I think in this case the retry loop should be immediately exited, so that the classpath search can start immediately. (I'm assuming that the intent here is *not* to wait around for another thread/process to finish uploading a resource - if that's the case then the branch for when the path doesn't exist should have a timeout, which it doesn't in your patch.)
----
bq. Without the changes to SchemaManager the test would fail.
Indeed, when I revert the SchemaManager changes, {{TestManagedSchemaAPI.testAddFieldAndDocument()}} fails. But {{testReloadAndAddSimple()}} always succeeds, regardless of the SchemaManager changes, so I'm not sure why it's there, since it doesn't have anything to do with new field visibility - can it be removed?
was (Author: steve_rowe):
Lots of good changes in your patch, Varun! I ran all Solr+Solrj+contrib tests and everything passes.
I found a few issues though.
This change is good, but there remains a pre-existing problem:
{code:java|title:SchemaManager.java}
- int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, -1);
- long startTime = System.nanoTime();
- long endTime = timeout > 0 ? System.nanoTime() + (timeout * 1000 * 1000) : Long.MAX_VALUE;
+ //The default timeout is 60s when no BaseSolrResource.UPDATE_TIMEOUT_SECS is specified
+ int timeout = req.getParams().getInt(BaseSolrResource.UPDATE_TIMEOUT_SECS, 60);
+
+ //If BaseSolrResource.UPDATE_TIMEOUT_SECS=0 or -1 then end time is unlimited.
+ long endTime = timeout > 0 ? TimeUnit.NANOSECONDS.convert(timeout, TimeUnit.SECONDS) + System.nanoTime() : Long.MAX_VALUE;
{code}
The problem is that {{System.nanoTime()}} can return *anything* in the range \[{{Long.MIN_VALUE}}, {{Long.MAX_VALUE}}\] and it intentionally overflows, so making {{endTime}} be {{Long.MAX_VALUE}} is effectively choosing a random time in the future; this is *not* the same as "unlimited". This same situation is dealt with in the {{QueryTimeout}} implementations - used by {{ExitableDirectoryReader}} - by adding a large value to the current value of {{nanoTime()}} to arrive at an effectively unlimited end time.
----
Your changes in {{ZkSolrResourceLoader.openResource()}} introduce the possibility of retrying 10 times to fetch a resource from ZK when it's not there (according to {{zkController.pathExists(file)}) - I think in this case the retry loop should be immediately exited, so that the classpath search can start immediately. (I'm assuming that the intent here is *not* to wait around for another thread/process to finish uploading a resource - if that's the case then the branch for when the path doesn't exist should have a timeout, which it doesn't in your patch.)
----
bq. Without the changes to SchemaManager the test would fail.
Indeed, when I revert the SchemaManager changes, {{TestManagedSchemaAPI.testAddFieldAndDocument()}} fails. But {{testReloadAndAddSimple()}} always succeeds, regardless of the SchemaManager changes, so I'm not sure why it's there, since it doesn't have anything to do with new field visibility - can it be removed?
> SchemaManager doesn't wait correctly for replicas to update
> -----------------------------------------------------------
>
> Key: SOLR-8662
> URL: https://issues.apache.org/jira/browse/SOLR-8662
> Project: Solr
> Issue Type: Bug
> Reporter: Varun Thacker
> Assignee: Varun Thacker
> Attachments: SOLR-8662.patch
>
>
> Currently in SchemaManager, waitForOtherReplicasToUpdate doesn't execute since it gets passed the timeout value instead of the end time
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org