You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tomás Fernández Löbbe (JIRA)" <ji...@apache.org> on 2017/12/08 20:10:00 UTC

[jira] [Commented] (SOLR-11739) Solr can accept duplicated async IDs

    [ https://issues.apache.org/jira/browse/SOLR-11739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284147#comment-16284147 ] 

Tomás Fernández Löbbe commented on SOLR-11739:
----------------------------------------------

I thought about three options
1. Fix the actual race condition, don't let duplicate async IDs at all.
2. Fix the Overseer so that it checks before running each task if one with the same ID was completed before.
3. Let the Overseer re-run the tasks (leave it as it is now). Maybe just add logging, or a way to show the error (failed tasks)

#3 can be dangerous, since the task could be something like a DELETEREPLICA. If the duplicate ID was caused by some broken retry logic on the client side, Solr could be deleting many replicas with what the client thought was a single command. 

#2 may be OK, the problem I see with that is that it gives an inconsistent behavior to the user (sometimes the duplicate IDs are rejected, and sometimes not). Also, this would make the Overseer silently drop tasks (yes, we can add some sort of failure in the logs but we can’t assume anyone is going to notice). 

#1 is the correct fix from the functional stand point, however I can’t think of a way to really fix the race condition without adding an extra write to ZooKeeper, which we’d have to do for every collection request with an asyncID. And this is to cover from a client misuse edge case. 

I think (and I discussed this offline with [~anshumg], he thinks this too) #1 is the way to go. I’ll put up a patch.

> Solr can accept duplicated async IDs
> ------------------------------------
>
>                 Key: SOLR-11739
>                 URL: https://issues.apache.org/jira/browse/SOLR-11739
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Tomás Fernández Löbbe
>            Priority: Minor
>         Attachments: SOLR-11739.patch
>
>
> Solr is supposed to reject duplicated async IDs, however, if the repeated IDs are sent fast enough, a race condition in Solr will let the repeated IDs through. The duplicated task is ran and and then silently fails to report as completed because the same async ID is already in the completed map. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org