You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2015/03/26 20:52:53 UTC
[jira] [Comment Edited] (CASSANDRA-8670) Large columns + NIO memory pooling causes excessive direct memory usage

    [ https://issues.apache.org/jira/browse/CASSANDRA-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14382524#comment-14382524 ] 

Ariel Weisberg edited comment on CASSANDRA-8670 at 3/26/15 7:52 PM:
--------------------------------------------------------------------

NIODataInputStream
bq. readNext() should assert it is never shuffling more than 7 bytes; in fact ideally this would be done by readMinimum() to make it clearer
By assert you mean an assert that compiles out or a precondition?

bq. readNext() should IMO never shuffle unless it's at the end of its capacity; if it hasRemaining() and limit() != capacity() it should read on from its current limit (readMinimum can ensure there is room to fully meet its requirements)
I guess I don't get when this optimization will help. I could see it hurting. You could stream through the buffer not returning to the beginning on a regular basis and end up issuing smaller then desired reads.

Users of buffered input stream get this behavior and I didn't want to change it. DataInput and company pull bytes out one at a time even for multi-byte types.

NIODataOutputStreamPlus
bq. available() should return the bytes in the buffer at least
I duplicated the JDK behavior for NIO. DataInputStream for a socket returns 0, for a file it returns the bytes remaining to read from the file. I think it makes sense for the API when you don't have a real answer.

bq. why the use of hollowBuffer? For clarity in case of restoring the cursor position during exceptions? Would be helpful to clarify with a comment. It seems like perhaps this should only be used for the first branch, though, since the second should have no risk of throwing an exception, so we can safely restore the position. It seems like it might be best to make hollowBuffer default to null, and instantiate it only if it is larger than our buffer size, otherwise first flushing our internal buffer if we haven't got enough room. This way we should rarely need the hollowBuffer.
The contract of the API requires that the incoming buffer not be modified. For thread safety reasons I don't modify the original buffer's position and then reset it in a finally block.

I am not sure what you mean by hollow buffer larger then our buffer. It's hollow so it has no size. We also use it copy things into our buffer while preserving the original position.

The rest is reasonable.






was (Author: aweisberg):
NIODataInputStream
bq. readNext() should assert it is never shuffling more than 7 bytes; in fact ideally this would be done by readMinimum() to make it clearer
By assert you mean an assert that compiles out or a precondition?

bq. readNext() should IMO never shuffle unless it's at the end of its capacity; if it hasRemaining() and limit() != capacity() it should read on from its current limit (readMinimum can ensure there is room to fully meet its requirements)
I guess I don't get when this optimization will help. I could see it hurting. You could stream through the buffer not returning to the beginning on a regular basis and end up issuing smaller then desired reads.

NIODataOutputStreamPlus
bq. available() should return the bytes in the buffer at least
I duplicated the JDK behavior for NIO. DataInputStream for a socket returns 0, for a file it returns the bytes remaining to read from the file. I think it makes sense for the API when you don't have a real answer.

bq. why the use of hollowBuffer? For clarity in case of restoring the cursor position during exceptions? Would be helpful to clarify with a comment. It seems like perhaps this should only be used for the first branch, though, since the second should have no risk of throwing an exception, so we can safely restore the position. It seems like it might be best to make hollowBuffer default to null, and instantiate it only if it is larger than our buffer size, otherwise first flushing our internal buffer if we haven't got enough room. This way we should rarely need the hollowBuffer.
The contract of the API requires that the incoming buffer not be modified. For thread safety reasons I don't modify the original buffer's position and then reset it in a finally block.

I am not sure what you mean by hollow buffer larger then our buffer. It's hollow so it has no size. We also use it copy things into our buffer while preserving the original position.

The rest is reasonable.





> Large columns + NIO memory pooling causes excessive direct memory usage
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-8670
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8670
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 3.0
>
>         Attachments: largecolumn_test.py
>
>
> If you provide a large byte array to NIO and ask it to populate the byte array from a socket it will allocate a thread local byte buffer that is the size of the requested read no matter how large it is. Old IO wraps new IO for sockets (but not files) so old IO is effected as well.
> Even If you are using Buffered{Input | Output}Stream you can end up passing a large byte array to NIO. The byte array read method will pass the array to NIO directly if it is larger than the internal buffer.  
> Passing large cells between nodes as part of intra-cluster messaging can cause the NIO pooled buffers to quickly reach a high watermark and stay there. This ends up costing 2x the largest cell size because there is a buffer for input and output since they are different threads. This is further multiplied by the number of nodes in the cluster - 1 since each has a dedicated thread pair with separate thread locals.
> Anecdotally it appears that the cost is doubled beyond that although it isn't clear why. Possibly the control connections or possibly there is some way in which multiple 
> Need a workload in CI that tests the advertised limits of cells on a cluster. It would be reasonable to ratchet down the max direct memory for the test to trigger failures if a memory pooling issue is introduced. I don't think we need to test concurrently pulling in a lot of them, but it should at least work serially.
> The obvious fix to address this issue would be to read in smaller chunks when dealing with large values. I think small should still be relatively large (4 megabytes) so that code that is reading from a disk can amortize the cost of a seek. It can be hard to tell what the underlying thing being read from is going to be in some of the contexts where we might choose to implement switching to reading chunks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)