You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/03/08 20:11:00 UTC
[jira] Created: (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Repair hangs if one of the neighbor is dead
-------------------------------------------
Key: CASSANDRA-2290
URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 0.7.3
Reporter: Sylvain Lebresne
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021529#comment-13021529 ]
Hudson commented on CASSANDRA-2290:
-----------------------------------
Integrated in Cassandra-0.8 #19 (See [https://hudson.apache.org/hudson/job/Cassandra-0.8/19/])
Fix unit tests for CASSANDRA-2290
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5, 0.8
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021528#comment-13021528 ]
Sylvain Lebresne commented on CASSANDRA-2290:
---------------------------------------------
Tests fixed, sorry about that.
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5, 0.8
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne resolved CASSANDRA-2290.
-----------------------------------------
Resolution: Fixed
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5, 0.8
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021306#comment-13021306 ]
Jonathan Ellis commented on CASSANDRA-2290:
-------------------------------------------
+1 the check-neighbors patch
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-2290) Repair hangs if one
of the neighbor is dead
Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004707#comment-13004707 ]
Aaron Morton edited comment on CASSANDRA-2290 at 3/9/11 6:45 PM:
-----------------------------------------------------------------
Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the FileStresmTask to only send one range and close the sending channel.
The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo()
While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending)
java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT)
When I was debugging the socketChannel was still reporting it was open.
Update: Modified FileStresmTask to call System.exit() after sending the first section and got the same result.
was (Author: amorton):
Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the FileStresmTask to only send one range and close the sending channel.
The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo()
While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending)
java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT)
When I was debugging the socketChannel was still reporting it was open.
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.4
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021527#comment-13021527 ]
Hudson commented on CASSANDRA-2290:
-----------------------------------
Integrated in Cassandra #856 (See [https://hudson.apache.org/hudson/job/Cassandra/856/])
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5, 0.8
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis reopened CASSANDRA-2290:
---------------------------------------
oops, need to fix AESTest now
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5, 0.8
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-2290) Repair hangs if one
of the neighbor is dead
Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004707#comment-13004707 ]
Aaron Morton edited comment on CASSANDRA-2290 at 3/9/11 6:32 PM:
-----------------------------------------------------------------
Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the FileStresmTask to only send one range and close the sending channel.
The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo()
While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending)
java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT)
When I was debugging the socketChannel was still reporting it was open.
was (Author: amorton):
Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088. I broke the FileStresmTask to only send one range and close the sending channel.
The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo()
While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending)
java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT)
When I was debugging the socketChannel was still reporting it was open.
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.4
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-2290:
----------------------------------------
Attachment: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
Attaching patch for the first problem above. It checks that all neighbors are alive before attempting the repair. If not, it don't start the repair. Another option would be to still do the repair with whomever neighbor are alive (if any). But I think that refusing to repair is a saner default and I'm fine waiting that someone needs the second option before considering adding it.
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.3
> Reporter: Sylvain Lebresne
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Aaron Morton (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004707#comment-13004707 ]
Aaron Morton commented on CASSANDRA-2290:
-----------------------------------------
Not sure if this helps. I found a place where AES was hanging while testing failure during streaming transfer for CASSANDRA-2088. I broke the FileStresmTask to only send one range and close the sending channel.
The IncomingStreamReader.readFile() got stuck in an infinite loop because it does not check the return from FileChannel.transferFrom(). It was returning 0 bytes read. Also the FileStreamTask does not check the bytes sent by transferTo()
While stuck in the loop the socket it was reading from was (127.0.0.1 was in the loop, .0.2 was sending)
java 25371 aaron 73u IPv4 0xffffff8010742ff8 0t0 TCP 127.0.0.1:7000->127.0.0.2:52759 (CLOSE_WAIT)
When I was debugging the socketChannel was still reporting it was open.
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.4
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-2290:
----------------------------------------
Description:
Repair don't cope well with dead/dying neighbors. There is 2 problems:
# Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
# If a neighbor dies mid-repair, the repair will also hang forever.
The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
Remaining Estimate: 1h
Original Estimate: 1h
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.3
> Reporter: Sylvain Lebresne
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021595#comment-13021595 ]
Hudson commented on CASSANDRA-2290:
-----------------------------------
Integrated in Cassandra-0.7 #447 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/447/])
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.5, 0.8
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2290) Repair hangs if one of the
neighbor is dead
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-2290:
--------------------------------------
Reviewer: stuhood
Priority: Minor (was: Major)
Affects Version/s: (was: 0.7.3)
0.6
Fix Version/s: 0.7.4
Assignee: Sylvain Lebresne
> Repair hangs if one of the neighbor is dead
> -------------------------------------------
>
> Key: CASSANDRA-2290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 0.7.4
>
> Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Repair don't cope well with dead/dying neighbors. There is 2 problems:
> # Repair don't check if a node is dead before sending a TreeRequest; this is easily fixable.
> # If a neighbor dies mid-repair, the repair will also hang forever.
> The second point is not easy to deal with. The best approach is probably CASSANDRA-1740 however. That is, if we add a way to query the state of a repair, and that this query correctly check all neighbors and also add a way to cancel a repair, this would probably be enough.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira