You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Andrey Kudryavtsev (JIRA)" <ji...@apache.org> on 2017/10/10 14:36:00 UTC
[jira] [Created] (SOLR-11459) AddUpdateCommand#prevVersion is not
cleared which may lead to problem for in-place updates for non existed
Andrey Kudryavtsev created SOLR-11459:
-----------------------------------------
Summary: AddUpdateCommand#prevVersion is not cleared which may lead to problem for in-place updates for non existed
Key: SOLR-11459
URL: https://issues.apache.org/jira/browse/SOLR-11459
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrey Kudryavtsev
There is a story how I met this one.
I have a 1_shard / *m*_replicas SolrCloud cluster and run batches of 5 - 10k in-place updates from time to time.
Once I noticed that job "hangs" - it started and couldn't finish for a a while.
Logs were full of messages like:
{code} Missing update, on which current in-place update depends on, hasn't arrived. id=__, looking for version=___, last found version=0" {code}
{code}
Tried to fetch document ___ from the leader, but the leader says document has been deleted. Deleting the document here and skipping this update: Last found version: 0, was looking for: ___",24,0,"but the leader says document has been deleted. Deleting the document here and skipping this update: Last found version: 0
{code}
Further analysis shows this:
* There are 100-500 updates for non-existed documents among regular updates (something that I have to deal with)
* Leader receives bunch of updates and executes this update one by one. {{JavabinLoader}} which is used by processing documents reuses same instance of {{AddUpdateCommand}} for every update and just [clearing its state at the end|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99]. [AddUpdateCommand#prevVersion| https://github.com/apache/lucene-solr/blob/6396cb759f8c799f381b0730636fa412761030ce/solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java#L76] is not cleared.
* In case of update is in-place update, but specified document is not existed, this update is processed as a regular atomic update (i.e. new doc is created), but {{prevVersion}} is used as a {{distrib.inplace.prevversion}} parameter in sequential calls to slave in DistributedUpdateProcessor. {{prevVersion}} wasn't cleared, so it may contain version from previous processed updates.
* Slaves checks it's own version on documents which is 0 (cause doc is not exists), slave thinks that some updates were missed and spends 5 seconds in [DistributedUpdateProcessor#waitForDependentUpdates|https://github.com/apache/lucene-solr/blob/e2521b2a8baabdaf43b92192588f51e042d21e97/solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java#L99] waiting for missed updates (no luck) and also tried to get "correct" version from leader (no luck as well)
* So update costs me *m* * 5 sec
I workarounded this by explicit check of doc existence, but it probably should be fixed.
Obviously first guess is that prevVersion should be cleared in {{AddUpdateCommand#clear}}, but have no clue how to test it.
{code}
+++ solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java (revision )
@@ -78,6 +78,7 @@
updateTerm = null;
isLastDocInBatch = false;
version = 0;
+ prevVersion = -1;
}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org