You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/09/12 17:00:09 UTC

[jira] [Created] (CASSANDRA-3178) Counter shard merging is not thread safe

Counter shard merging is not thread safe
----------------------------------------

                 Key: CASSANDRA-3178
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.5
            Reporter: Sylvain Lebresne
             Fix For: 0.8.6


The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).

Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne reassigned CASSANDRA-3178:
-------------------------------------------

    Assignee: Sylvain Lebresne

> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.6
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145576#comment-13145576 ] 

Hudson commented on CASSANDRA-3178:
-----------------------------------

Integrated in Cassandra-0.8 #394 (See [https://builds.apache.org/job/Cassandra-0.8/394/])
    patch by slebresne; reviewed by yukim for CASSANDRA-3178

slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1198725
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/config/CFMetaData.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/CounterColumn.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/CounterMutation.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/Memtable.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionController.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/LazilyCompactedRow.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/PrecompactedRow.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/context/CounterContext.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageProxy.java
* /cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/db/CounterMutationTest.java
* /cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/db/context/CounterContextTest.java

                
> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.8, 1.0.3
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction-1.0.patch, 0001-Move-shard-merging-completely-to-compaction-v2.patch, 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code-1.0.patch, 0002-Simplify-improve-shard-merging-code-v2.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143848#comment-13143848 ] 

Yuki Morishita commented on CASSANDRA-3178:
-------------------------------------------

+!. 1.0 patch also works fine.
                
> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.8
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction-1.0.patch, 0001-Move-shard-merging-completely-to-compaction-v2.patch, 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code-1.0.patch, 0002-Simplify-improve-shard-merging-code-v2.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3178:
----------------------------------------

    Attachment: 0002-Simplify-improve-shard-merging-code-1.0.patch
                0001-Move-shard-merging-completely-to-compaction-1.0.patch

Attaching patches rebased against 1.0
                
> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.8
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction-1.0.patch, 0001-Move-shard-merging-completely-to-compaction-v2.patch, 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code-1.0.patch, 0002-Simplify-improve-shard-merging-code-v2.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142240#comment-13142240 ] 

Yuki Morishita commented on CASSANDRA-3178:
-------------------------------------------

LGTM on 0.8 branch. I think it's safe to apply this on 1.0, but before that I want to make sure it works. Could you modify the patch so that I can test on 1.0?
                
> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.8
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction-v2.patch, 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code-v2.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3178:
----------------------------------------

    Attachment: 0002-Simplify-improve-shard-merging-code.patch
                0001-Move-shard-merging-completely-to-compaction.patch

First patch move the code from counter replication to compaction (as described above). The second patch adapt the code to work correctly with the move to compaction but also simply and improve that code.

> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.6
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103708#comment-13103708 ] 

Sylvain Lebresne commented on CASSANDRA-3178:
---------------------------------------------

Note that the removal of flush_after_mins in 1.0 is a problem for this patch. The reason is that we want to remove a shard corresponding to a NodeId for which we know no increment has been made after time t. For that removal to be safe, we must make sure that compaction includes everything that has been issued before time t. For that, current patch check that the compaction has started after time t + 2 * flush_after_mins. I'll update the patch to use the memtables creationTime instead.  

> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.6
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3178:
--------------------------------------

    Reviewer: yukim

Yuki, can you review?
                
> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.8
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction-v2.patch, 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code-v2.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3178) Counter shard merging is not thread safe

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3178:
----------------------------------------

    Attachment: 0002-Simplify-improve-shard-merging-code-v2.patch
                0001-Move-shard-merging-completely-to-compaction-v2.patch

Attaching v2, rebased and that remove the use of flush_after_mins.

> Counter shard merging is not thread safe
> ----------------------------------------
>
>                 Key: CASSANDRA-3178
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3178
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.5
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8.6
>
>         Attachments: 0001-Move-shard-merging-completely-to-compaction-v2.patch, 0001-Move-shard-merging-completely-to-compaction.patch, 0002-Simplify-improve-shard-merging-code-v2.patch, 0002-Simplify-improve-shard-merging-code.patch
>
>
> The first part of the counter shard merging process is done during counter replication. This was done there because it requires that all replica are made aware of the merging (we could only rely on nodetool repair for that but that seems much too fragile, it's better as just a safety net). However this part isn't thread safe as multiple threads can do the merging for the same shard at the same time (which shouldn't really "corrupt" the counter value per se, but result in an incorrect context).
> Synchronizing that part of the code would be very costly in term of performance, so instance I propose to move the part of the shard merging done during replication to compaction. It's a better place anyway. The only downside is that it means compaction will sometime send mutations to other node as a side effect, which doesn't feel very clean but is probably not a big deal either.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira