You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2013/12/02 17:57:42 UTC

[jira] [Updated] (CASSANDRA-6415) Snapshot repair blocks for ever if something happens to the "I made my snapshot" response

     [ https://issues.apache.org/jira/browse/CASSANDRA-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-6415:
--------------------------------------

    Attachment: 6415-1.2.txt

I think it is better to re-think use of executors and threads around repair, for example, right now Differencer runs on ANTI_ENTROPY stage one by one.

But IMO, that would be quite a change for 1.2.x, so I propose just changing Snapshot response message type to INTERNAL_RESPONSE instead of REQUEST_RESPONSE which is droppable by MessagingService. So at least snapshot request messages don't get lost in next 1.2 release.

> Snapshot repair blocks for ever if something happens to the "I made my snapshot" response
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6415
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6415
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>         Attachments: 6415-1.2.txt
>
>
> The "snapshotLatch.await();" can be waiting for ever and block all repair operations indefinitely if something happens that another node doesn't respond.
> {noformat}
>             public void makeSnapshots(Collection<InetAddress> endpoints)
>             {
>                 try
>                 {
>                     snapshotLatch = new CountDownLatch(endpoints.size());
>                     IAsyncCallback callback = new IAsyncCallback()
>                     {
>                         public boolean isLatencyForSnitch()
>                         {
>                             return false;
>                         }
>                         public void response(MessageIn msg)
>                         {
>                             RepairJob.this.snapshotLatch.countDown();
>                         }
>                     };
>                     for (InetAddress endpoint : endpoints)
>                         MessagingService.instance().sendRR(new SnapshotCommand(tablename, cfname, sessionName, false).createMessage(), endpoint, callback);
>                     snapshotLatch.await();
>                     snapshotLatch = null;
>                 }
>                 catch (InterruptedException e)
>                 {
>                     throw new RuntimeException(e);
>                 }
>             }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)