You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2017/10/10 02:09:00 UTC

[jira] [Updated] (SOLR-11447) ZkStateWriter should process commands in atomic

     [ https://issues.apache.org/jira/browse/SOLR-11447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cao Manh Dat updated SOLR-11447:
--------------------------------
    Attachment: SOLR-11447.patch

My patch for this ticket, including test. This patch
- ZkStateWriter.enqueueUpdate now accept a list of ZkWriteCommand instead of just single ZkWriteCommand
- do some refactoring in enqueueUpdate to makes it cleaner
- do a change on ZkStateWriter.maybeFlushAfter, right now we use updates.size() to trigger the flush, but I think this logic is wrong, because of updates.size() equals to the number of collection get changed not how many ZkWriteCommand get processed.

But I kinda confuse on handling ZkStateWriter.maybeFlushBefore with multiple commands, can you please take a look?
[~shalinmangar] [~noble.paul]

> ZkStateWriter should process commands in atomic
> -----------------------------------------------
>
>                 Key: SOLR-11447
>                 URL: https://issues.apache.org/jira/browse/SOLR-11447
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>         Attachments: SOLR-11447.patch
>
>
> ZkStateWriter should process all the ZkWriteCommands correspond to a message in atomic ( we are processing one by one command right now ). Some ZkWriteCommands can get lost. Here is the case :
> 1. We process DOWNNODE message ( whatever message that produces multiple ZkWriteComand ).
> 2. We poll that message from stateUpdateQueue and push it to workQueue ( for backup ).
> 3. The DOWNNODE message is converted into multiple ZkWriteCommand
> 4. We enqueue one by one ZkWriteCommand into ZkStateWriter. Any command can trigger flush, which calls the onWrite() callback to empty workQueue
> 5. The Overseer gets restarted, and the rest of ZkWriteCommands (which not get processed in step 4) will be lost because the workQueue is empty now (because onWrite() callback in step 4)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org