You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Greg Katz (JIRA)" <ji...@apache.org> on 2011/06/09 21:40:59 UTC
[jira] [Created] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to
throw a write exception encountered after the user begins to close the
writer
ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
---------------------------------------------------------------------------------------------------------------
Key: CASSANDRA-2755
URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
Project: Cassandra
Issue Type: Bug
Components: Hadoop
Affects Versions: 0.8.0
Reporter: Greg Katz
There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
# W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
# U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
# W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
# U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis reassigned CASSANDRA-2755:
-----------------------------------------
Assignee: Mck SembWever
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056446#comment-13056446 ]
Mck SembWever commented on CASSANDRA-2755:
------------------------------------------
Jonathan: Is your patch being applied?
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056503#comment-13056503 ]
Hudson commented on CASSANDRA-2755:
-----------------------------------
Integrated in Cassandra-0.7 #515 (See [https://builds.apache.org/job/Cassandra-0.7/515/])
fix race that could result in Hadoopwriter failing to throw exception for encountered error
patch by Mck SembWever and jbellis for CASSANDRA-2755
jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140565
Files :
* /cassandra/branches/cassandra-0.7/CHANGES.txt
* /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordWriter.java
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.7.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Priority: Minor
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048678#comment-13048678 ]
Mck SembWever commented on CASSANDRA-2755:
------------------------------------------
bq. it's always possible that the last put() will happen before an exception is set; hence, the extra check on close.
Quite right.
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056459#comment-13056459 ]
Jonathan Ellis commented on CASSANDRA-2755:
-------------------------------------------
waiting for a +1, wasn't clear if your last comment was intended that way.
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056462#comment-13056462 ]
Mck SembWever commented on CASSANDRA-2755:
------------------------------------------
Yes it was a +1
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048604#comment-13048604 ]
Mck SembWever commented on CASSANDRA-2755:
------------------------------------------
The check for the exception also occurs in ColumnFamilyRecordWriter.write(buf, value) -> RangeClient.put(pair)
Isn't it possible the put(..) is being called while the RangeClient thread is inside close() ?
(isn't write(..) called more often than close() ?)
For this reason inside RangeClient.run() i assigned lastException before calling close()
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to
throw a write exception encountered after the user begins to close the
writer
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-2755:
--------------------------------------
Attachment: 2755-v2.txt
It looks to me that as long as we check for the exception before calling join, there will be a window to miss one.
v2 encapsulates RangeClient.close better to avoid this.
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to
throw a write exception encountered after the user begins to close the
writer
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-2755:
--------------------------------------
Priority: Minor (was: Major)
Affects Version/s: (was: 0.8.0)
0.7.0
Fix Version/s: 0.8.2
0.7.7
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.7.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Priority: Minor
> Fix For: 0.7.7, 0.8.2
>
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2755) ColumnFamilyRecordWriter fails
to throw a write exception encountered after the user begins to close the
writer
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048609#comment-13048609 ]
Jonathan Ellis commented on CASSANDRA-2755:
-------------------------------------------
bq. Isn't it possible the put(..) is being called while the RangeClient thread is inside close?
old close, new closeInternal?
Yes, but I don't see how that changes things. I.e., it's always possible that the last put() will happen before an exception is set; hence, the extra check on close.
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: 2755-v2.txt, CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to
throw a write exception encountered after the user begins to close the
writer
Posted by "Mck SembWever (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mck SembWever updated CASSANDRA-2755:
-------------------------------------
Attachment: CASSANDRA-2755.patch
In RangeClient i cannot see why close() needs to be called before lastException is assigned. The following patch should work: I have tested it against various jobs but i have no reproducible testcase to confirm this bug against.
Also in the patch is a slight cleanup to ColumnFamilyRecordWriter's close() methods: keeping implementation out of deprecated methods.
> ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
> ---------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2755
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.0
> Reporter: Greg Katz
> Assignee: Mck SembWever
> Attachments: CASSANDRA-2755.patch
>
>
> There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread):
> # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted.
> # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread.
> # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits.
> # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything.
> This race condition means that intermittently write failures will go undetected.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira