You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2010/01/06 18:27:54 UTC

[jira] Created: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

TcpReader is slow because of using Exception handling in the normal path
------------------------------------------------------------------------

                 Key: CASSANDRA-675
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.9
            Reporter: Jun Rao
            Assignee: Jun Rao
             Fix For: 0.9


TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797709#action_12797709 ] 

Brandon Williams commented on CASSANDRA-675:
--------------------------------------------

I received the following traceback on the bootstrapping node during bootstrap:

WARN - Problem reading from socket connected to : java.nio.channels.SocketChannel[closed]
java.lang.NullPointerException
        at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:438)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
INFO - Closing errored connection java.nio.channels.SocketChannel[closed]
WARN - Problem reading from socket connected to : java.nio.channels.SocketChannel[closed]
java.lang.NullPointerException
        at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:438)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
INFO - Closing errored connection java.nio.channels.SocketChannel[closed]

I also bootstrapped without this patch and this traceback did not occur.

I also saw good improvement with CL.One writes with this patch.

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798123#action_12798123 ] 

Jonathan Ellis commented on CASSANDRA-675:
------------------------------------------

I'd rather do a 0.6 release with this and other recent improvements, than muddy the waters of what constitutes a "stable" branch

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-675:
------------------------------

    Attachment: issue675.patchv2

Could you try patch v2 (a 1-line change from v1)? 

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao resolved CASSANDRA-675.
-------------------------------

    Resolution: Fixed

committed. Thanks Brandon for testing it out.

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799809#action_12799809 ] 

Jonathan Ellis commented on CASSANDRA-675:
------------------------------------------

I took another look and although I really want to call this a bug fix and get it into 0.5.0, the changes to wakeup/interestops make me nervous; we have seen jvm bugs around this area before.  I'm not comfortable slipping it in to a stable branch in the final stages of RC.

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797790#action_12797790 ] 

Brandon Williams commented on CASSANDRA-675:
--------------------------------------------

No exceptions this time.  Unfortunately I couldn't completely verify the bootstrap worked because I ran into CASSANDRA-682, but that is independent of this issue.

I performed some benchmarks at ConsistencyLevel.ONE, querying through a single node, on a 4 node cluster with RF=3.  Writes were about twice as fast, and reads were nearly 3.5 times as fast with this patch.  Nice work, Jun!

+1

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799822#action_12799822 ] 

Jun Rao commented on CASSANDRA-675:
-----------------------------------

The exception change and the selector change are independent. Probably more than 70% of the benefit comes from the former. Will people be interested in porting just the first change to 0.5?

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-675:
------------------------------

    Attachment: issue675.patchv1

The attached patch fixes 2 problems.

1. TcpReader.read() has to go through a set of protocols (header, content, etc) to fully read a message. If the socket doesn't have enough bytes to read to produce a full message, a ReadNotCompleteException is thrown. This exception is then handled by TcpReader.read() and if the socket has new bytes to read, TcpReader.read() resumes from the protocol that's left last time. It seems that this exception handling can take at least 1-2ms. The patch converts exception to a normal return with a special value. 

I still don't quite understand why exception handling in java is so expensive though. Note that just pre-allocating ReadNotCompleteException itself (which I thought is where most of the overhead came from) doesn't help. Throwing exception has to be completely avoided.

2. Change Selector.select(1) to Selector.select() and wake up the selector every time that the interest bit of a selectionkey needs to be changed. Without this change, it could take up to 1ms for the interest bit to be registered with the selector.

Here are some performance results of reading a column with a 4k value (average response time in ms for local weak reads, the old quorum read w/o this patch, the new quorum reads with this patch). With the patch, quorum reads are 3X-6X faster.

threads local_weak_read old_quorum_read new_quorum_read
1	0.71974982	9.546683881	2.002089927
2	0.919307311	12.34252153	1.966206096
4	1.018249762	18.62889817	2.243343764
8	1.136501263	25.49487977	3.213168828
16	1.796865109	29.8252928	5.889078686
32	3.60204913	40.44861522	11.65799948

This patch also has a minor change to TCP streaming. Can someone verify TCP streaming still works with this patch?

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799826#action_12799826 ] 

Jonathan Ellis commented on CASSANDRA-675:
------------------------------------------

I would +1 a patch w/ just the exception changes for 0.5.

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-675:
------------------------------

    Attachment: issue675.patchv2-0.5

Attach a patch for the part that removes exception for 0.5 branch. I can't seem to commit it on the 0.5 branch though. Can someone else commit?

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2, issue675.patchv2-0.5
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797788#action_12797788 ] 

Jonathan Ellis commented on CASSANDRA-675:
------------------------------------------

minor point: we use ArrayUtils.EMPTY_BYTE_ARRAY instead of scattering static copies around.

feel free to slip that into the committed version if Brandon's tests check out.

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798110#action_12798110 ] 

Chris Goffinet commented on CASSANDRA-675:
------------------------------------------

I was thinking about it this morning as well. I think it's a good idea to backport, the patch isn't very much.

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798093#action_12798093 ] 

Stu Hood commented on CASSANDRA-675:
------------------------------------

Since this patch has a relatively low surface area, and such huge benefits for performance, perhaps we could backport it to the 0.5 branch as well, to encourage people not to jump ship for trunk quite yet?

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800239#action_12800239 ] 

Jonathan Ellis commented on CASSANDRA-675:
------------------------------------------

committed.  (maybe you have 0.5 checked out w/ http instead of https?)

> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2, issue675.patchv2-0.5
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-675) TcpReader is slow because of using Exception handling in the normal path

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797996#action_12797996 ] 

Hudson commented on CASSANDRA-675:
----------------------------------

Integrated in Cassandra #317 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/317/])
    TcpReader is slow because of using Exception handling in the normal path; patched by junrao; reviewed by Bradon Williams and jbellis for 


> TcpReader is slow because of using Exception handling in the normal path
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>             Fix For: 0.9
>
>         Attachments: issue675.patchv1, issue675.patchv2
>
>
> TcpReader has the overhead of 1-2ms per message reading. This makes quorum reads too slow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.