You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2009/12/30 03:45:29 UTC

[jira] Created: (CASSANDRA-658) Hinted Handoff CF contention

Hinted Handoff CF contention
----------------------------

                 Key: CASSANDRA-658
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.5
         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)

            Reporter: Brandon Williams
             Fix For: 0.9


Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.

To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797998#action_12797998 ] 

Hudson commented on CASSANDRA-658:
----------------------------------

Integrated in Cassandra #317 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/317/])
    update release notes for config changes

Patch by eevans for 


> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795733#action_12795733 ] 

Jonathan Ellis commented on CASSANDRA-658:
------------------------------------------

new version 2 fixes the races and adds a similar fine-grained approach to SuperColumn (which is not really more expensive, since we're paying the price of using a Concurrent map implementation already).  This includes Stu's size-removal patch.  03 does more cleanup.

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-658:
-------------------------------

    Comment: was deleted

(was: Hmm, also, I noticed that these patches don't actually remove the 'sharded row locks'. They shouldn't be necessary anymore, correct?)

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-658:
-------------------------------------

    Attachment: 0002-replace-sharded-row-locks-with-column-level-locking.txt
                0001-use-throughput-and-op-count-instead-of-size-and-column.txt

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795726#action_12795726 ] 

Jonathan Ellis edited comment on CASSANDRA-658 at 1/1/10 2:23 AM:
------------------------------------------------------------------

this actually has a race for simple columns: if a value is preset for a given column name, and two threads (A and B) run the putIfAbsent part at the same time, then A goes into the synchronized block and changes the value, thread C could attempt addColumn and sync on the value put by A, while B syncs on the old value.

This should be fixable by using replace(), not put().

      was (Author: jbellis):
    this actually has a race for simple columns: if a value is preset for a given column name, and two threads (A and B) run the putIfAbsent part at the same time, then A goes into the synchronized block and changes the value, thread C could attempt addColumn and sync on the value put by A, while B syncs on the old value.
  
> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-remove-atomic-size-from-sc.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-658:
-------------------------------------

    Attachment: 0003-r-m-unused-code.txt
                0002-replace-sharded-row-locks-with-column-level-locking.txt
                0001-use-throughput-and-op-count-instead-of-size-and-column.txt

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-658:
-------------------------------

    Attachment: 0003-remove-atomic-size-from-sc.txt

Removes the atomic size calculation from SuperColumn, which was not being updated for remove() anyway.

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-remove-atomic-size-from-sc.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-658:
-------------------------------------

    Attachment:     (was: 0003-remove-atomic-size-from-sc.txt)

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795653#action_12795653 ] 

Stu Hood commented on CASSANDRA-658:
------------------------------------

Brilliant... I really like this change.

I think the main reason that we had an Atomic variable for size() in CF and SC was to perform this calculation, so perhaps that code should be considered dead, and removed. I noticed a bunch of subtle bugs in it last time I was looking anyway. The size() method can remain, and calculate the size for each call?

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797616#action_12797616 ] 

Hudson commented on CASSANDRA-658:
----------------------------------

Integrated in Cassandra #316 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/316/])
    replace sharded row locks with column-level locking
patch by jbellis; tested by Brandon Williams for 
use throughput and op count instead of size and column count to determine when to flush, greatly reducing the amount of synchronization required to insert
patch by jbellis; tested by Brandon Williams for 


> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795678#action_12795678 ] 

Jonathan Ellis commented on CASSANDRA-658:
------------------------------------------

We still want to know size for when we're serializing, so I've left that code alone for now.

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-658:
-------------------------------------

    Attachment:     (was: 0002-replace-sharded-row-locks-with-column-level-locking.txt)

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795726#action_12795726 ] 

Jonathan Ellis commented on CASSANDRA-658:
------------------------------------------

this actually has a race for simple columns: if a value is preset for a given column name, and two threads (A and B) run the putIfAbsent part at the same time, then A goes into the synchronized block and changes the value, thread C could attempt addColumn and sync on the value put by A, while B syncs on the old value.

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-remove-atomic-size-from-sc.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797517#action_12797517 ] 

Brandon Williams commented on CASSANDRA-658:
--------------------------------------------

+1, I replicated the exact scenario originally outlined and insert speed does not dramatically drop.

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-658:
-------------------------------------

    Attachment:     (was: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt)

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt, 0003-r-m-unused-code.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-658) Hinted Handoff CF contention

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795654#action_12795654 ] 

Stu Hood commented on CASSANDRA-658:
------------------------------------

Hmm, also, I noticed that these patches don't actually remove the 'sharded row locks'. They shouldn't be necessary anymore, correct?

> Hinted Handoff CF contention
> ----------------------------
>
>                 Key: CASSANDRA-658
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-658
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>         Environment: debian lenny amd64 OpenJDK 64-Bit Server VM (build 1.6.0_0-b11, mixed mode)
>            Reporter: Brandon Williams
>            Assignee: Jonathan Ellis
>             Fix For: 0.9
>
>         Attachments: 0001-use-throughput-and-op-count-instead-of-size-and-column.txt, 0002-replace-sharded-row-locks-with-column-level-locking.txt
>
>
> Hinted handoff causes a lot of contention on the HH CF, causing insert speed to massively drop.  Most of the row mutation stage threads end up blocking on each other at Memtable.resolve.  This is because HH sends the hint to the closest node, which will always be the node handling the write.
> To reproduce: start a cluster with even InitialTokens, and begin a constant stream of writes to one node, with an even key distribution. (I used 4 nodes and stress.py in random mode.)  Take a node down, and the insert rate begin to drop, eventually settling between 100-300/s and sustaining there.  Bringing the down node back up will restore the original insert rate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.