You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "James Golick (JIRA)" <ji...@apache.org> on 2010/06/20 23:22:24 UTC

[jira] Created: (CASSANDRA-1214) Make standard IO the default

Make standard IO the default
----------------------------

                 Key: CASSANDRA-1214
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7
            Reporter: James Golick


The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.

This is a dangerous and wrong default for a couple of reasons.

1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.

That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.

2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-1214:
-----------------------------------------

    Assignee: Jonathan Ellis

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, 1214-v4.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Tupshin Harper (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883726#action_12883726 ] 

Tupshin Harper commented on CASSANDRA-1214:
-------------------------------------------

I am strongly in favor of defaults that are as flexible and stable as possible. If it is hard for even a relatively small percentage of users to get stable performance with mmap, then I would agree that the default should be standard I/O. There should then be a Cassandra Tuning wiki page that include a mmap discussion.

That said, I also agree that it is worth doing the native code work to get mmap more stable with larger datasets and/or smaller machines.

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1214:
--------------------------------------

          Summary: Force linux to not swap the JVM  (was: Make standard IO the default)
       Issue Type: Improvement  (was: Bug)
    Fix Version/s: 0.6.5
      Component/s: Core

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899326#action_12899326 ] 

Peter Schuller commented on CASSANDRA-1214:
-------------------------------------------

Well, posix_fadvise() is potentially a bit more problematic than mlockall(). It again takes flags, whose values I suppose may be as practically standardized as supposedly for mlockall() (though I have not yet checked). In addition it takes an off_t which, being an abstract type, would have potential for portability concerns but a quick Googling suggests  (http://markmail.org/message/qvf7hhq2mgmwwmw3) JNA has some particular support for the off_t data type though I did not find it right now in the API docs (will have to check more carefully).

The other thing is that posix_fadvise() will need a file descriptor in integer form. java.io.FileDescriptor is decidedly abstract and does not expose this information (which is understandable). I am not aware, off hand, of a good way for us to obtain the relevant underlying file descriptor; anyone? Molesting FileDescriptor with reflection should technically do the trick with openjdk/sun derived VM:s (at least based on current openjdk7 FileDescriptor.java), but.... yuck.

If it weren't for the build problems implied by JNI I would strongly prefer it. Under the circumstances I'm not sure. One observation is that given the kind of ifs and buts one seems to have to resort to anyway, writing some simple semi-portable build rules in Ant, specifically targetting certain platforms and compilers, does not feel so bad. Even if one hard-codes each common platform to avoid solving the native build problem generally, that does not feel worse to me in practice than making the assumptions necessary with JNA and stuff like using reflection to access private fields... 

As long as the native building remain optional and does not hinder anyone getting Cassandra to work with just Java, and as long as it is relatively easy for someone on an unsupported/problematic platform to simply build the JNI libraries themselves (doable by e.g. a simple Makefile with clear instructions for pointing to JDK headers etc), JNI feels pretty reasonable to me.

Thoughts? Am I painting a bleaker picture than reality with respect to using JNA?



> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896617#action_12896617 ] 

Peter Schuller commented on CASSANDRA-1214:
-------------------------------------------

It all sounds reasonable.

So I take it the way forward would be to take your JNA version and combine with the configuration/policy parts of my patch (assuming people agree that those parts are a good idea) and go for that version for now and maybe move to JNI in the future if JNI becomes a dependency anyway for some other reason.

Any objections?

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1214:
--------------------------------------

    Attachment: 1214-v3.txt

patch that uses JNA, with catch for various error conditions and more informative logging where possible.

As discussed above, we can't ship JNA with Cassandra but we can pull it in with ivy at build time.   So one of the conditions handled is simply "JNA doesn't exist at runtime."  (But we don't need to resort to reflection to allow it to compile without JNA.)   [A sufficiently recent version of JNA is not available in the main public maven repo, and that won't change in the near future, so we will host one on Riptano's repo.  I will update this patch when that is ready.]

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896444#action_12896444 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

How does the JNA approach behave if there is no C library (Windows?) or mlockall doesn't exist (OS X?)

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881088#action_12881088 ] 

Todd Lipcon commented on CASSANDRA-1214:
----------------------------------------

Configuring /proc/sys/vm/swappiness down to 0-10 may also help.

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1214) Make standard IO the default

Posted by "Jon Hermes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jon Hermes reassigned CASSANDRA-1214:
-------------------------------------

    Assignee:     (was: Jon Hermes)

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Jeff Hodges (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880681#action_12880681 ] 

Jeff Hodges commented on CASSANDRA-1214:
----------------------------------------

This is one of the very first things we've had to do with every cluster we've built. The mmap implementation just does not work for anything I've seen in production beyond trivial datasets. This would be a wonderful, reality-driven change.

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1214.
---------------------------------------

      Reviewer: jhermes
    Resolution: Fixed

committed

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, 1214-v4.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888950#action_12888950 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

according to http://andrigoss.blogspot.com/2008/02/jvm-performance-tuning.html, using huge pages automatically gives us the lock-jvm-heap-in-memory behavior we want, and may provide a substantial performance benefit as well.

See also: http://java.sun.com/javase/technologies/hotspot/largememory.jsp

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>         Attachments: Read Throughput with mmap.jpg
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-1214) Make standard IO the default

Posted by "Jon Hermes (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jon Hermes reassigned CASSANDRA-1214:
-------------------------------------

    Assignee: Jon Hermes

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>            Assignee: Jon Hermes
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1214:
--------------------------------------

    Attachment: 1214-v3.txt

(correct patch attached)

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Nate McCall (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886037#action_12886037 ] 

Nate McCall commented on CASSANDRA-1214:
----------------------------------------

I have not hit this issue yet, but has anyone tried using the -XX:MaxDirectMemorySize option?

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906458#action_12906458 ] 

Chris Goffinet commented on CASSANDRA-1214:
-------------------------------------------

mmap + memlock gives us about 13% improvement, on our test bed we were maxing out our 4 cores. 

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.5, 0.7 beta 2
>
>         Attachments: 1214-v3.txt, 1214-v4.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1214:
--------------------------------------

    Attachment: 1214-v4.txt

v4 includes the ivy changes to download jna at build time.

Again, the relevant text from http://www.apache.org/legal/3party.html is, "LGPL v2.1-licensed works must not be included in Apache products, although they may be listed as system requirements or distributed elsewhere as optional works."  We are not including jna, nor are we even requiring it [although it explicitly states it would be fine to do so].  The only restriction is on distributing the lgpl work itself, so while Hadoop is welcome to pile additional restrictions on themselves this is fine for us, since (and perhaps this wasn't clear) dependencies we pull in with ivy are build-time only, and are not distributed with our source or binary artifacts.

(FWIW it is also fine for an apache-licensed debian package, to declare a dependency on an lgpl one.)

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, 1214-v4.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899260#action_12899260 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

Ugh, that's a pain.  (JFFI is also LGPL.)

It's not a deal breaker for us since we'd like to use it for basically optimizations... ASF says "LGPL v2.1-licensed works must not be included in Apache products, although they may be listed as system requirements or distributed elsewhere as optional works" so that would be workable if sub-optimal.

Curious if Peter things we're going to have to go raw JNI for fadvise on compactions.  If we're going to have to bite that bullet anyway then JNA gets less interesting.

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jon Hermes (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899986#action_12899986 ] 

Jon Hermes commented on CASSANDRA-1214:
---------------------------------------

+1. 

It's a best-effort patch dependant on OS (which is all we can do, short of defaulting to mmap_index_only and taking a performance hit by default). Assuming the average use case, this is a much better default than before.

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, 1214-v4.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Folke Behrens (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896571#action_12896571 ] 

Folke Behrens commented on CASSANDRA-1214:
------------------------------------------

{quote}
How does the JNA approach behave if there is no C library (Windows?) or mlockall doesn't exist (OS X?)
{quote}
In case of  Mac OS X an UnsatisfiedLinkError will be thrown. Windows? I don't know. Maybe a JNA-specific exception, maybe a ULE, too. OS's can be easily detected with Platform.isXXX() and dealt with accordingly. 

{quote}
something as simple as "grab errno" became a holy mess of portability concerns.
{quote}
Yes, but errno is a particularly hard case. The "inventors" messed up big time with this. That's why the JNA developers provide two ways to check errno: you either mark your methods with "throws LastErrorException" or you ask Native.getLastError(). This works under Windows, too.

{quote}
The proposed JNA patch seems to suffer from exactly this problem as far as I can see, making assumptions about what the concrete values are of MCL_CURRENT and MCL_FUTURE.
{quote}
Theoretically, you're right, in practice, however, I can't find a single POSIX system that assigns different values to MCL_CURRENT or MCL_FUTURE, and I think it's highly unlikely that these will change in the future. If so, Cassandra's code can be adjusted.

{quote}
As far as I can tell, once one has gotten over the initial one-time hurdle of using JNI and the associated building issues, you have a much more correct/standards-compliant access to the native platform than through JNA since you're in compile time with access to appropriate headers etc.
Please do correct me if I'm wrong, since the idea of avoiding compile time/build issues is certainly very attractive and the reason why I tried to find an acceptable solution with JNA in the past.
{quote}
You're absolutely right, and your JNI code is really superb. If Cassandra needs to bind a couple more native functions I'd say JNI is the way to go. But not just yet.


> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Make standard IO the default

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated CASSANDRA-1214:
--------------------------------------

    Attachment: Read Throughput with mmap.jpg

Yes, I also found it is not good with mmap.

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>         Attachments: Read Throughput with mmap.jpg
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Make standard IO the default

Posted by "Folke Behrens (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Folke Behrens updated CASSANDRA-1214:
-------------------------------------

    Attachment: mlockall-jna.patch.txt

Whoa ... have you looked at JNA?
# Apply attached patch.
# Put jna.jar (LGPL 2.1 / ~900k) in /lib/
# Start Cassandra with CAP_IPC_LOCK (or as "root").
# Linux: grep Unevictable /proc/meminfo

https://jna.dev.java.net/servlets/ProjectDocumentList?folderID=12329&expandFolder=12329&folderID=0




> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Make standard IO the default

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Schuller updated CASSANDRA-1214:
--------------------------------------

    Attachment: trunk-1214.txt

This is the patch referred to by the previous comment. The 'submit patch' workflow never asked me to upload a file (or I missed it somehow).

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896473#action_12896473 ] 

Peter Schuller commented on CASSANDRA-1214:
-------------------------------------------

I'll admit I did not investigate JNA (or POSIX-JNA) for this particular case. Last time I did however, I found it lacking. Very trivial cases were okay, but even something as simple as "grab errno" became a holy mess of portability concerns.

I looked briefly at what posix-jna does, and I was unable to find any magic bullets in there and instead saw things like hard-coded constants that are non-portable and difficult to detect when they break due to changes to some particular platform.

The proposed JNA patch seems to suffer from exactly this problem as far as I can see, making assumptions about what the concrete values are of MCL_CURRENT and MCL_FUTURE.

As far as I can tell, once one has gotten over the initial one-time hurdle of using JNI and the associated building issues, you have a much more correct/standards-compliant access to the native platform than through JNA since you're in compile time with access to appropriate headers etc.

Please do correct me if I'm wrong, since the idea of avoiding compile time/build issues is certainly *very* attractive and the reason why I tried to find an acceptable solution with JNA in the past.



> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881055#action_12881055 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

It seems that what is happening is, 

- the JVM hasn't needed to run a major collection in a while, 
- so Linux says "I'll swap part of the JVM's heap so I can pull more of this hot sstable into ram,"
- then the JVM goes to GC and thrashes pulling its heap in from swap

The "right" solution is probably to use mlockall(MCL_CURRENT) on JVM start (with min heap = max heap so that gets pre-allocated).  Then perform the mmapping.

mmap'd io is enough faster that this is probably worth biting the native code bullet for.

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899728#action_12899728 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

Works fine here.

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Folke Behrens (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899725#action_12899725 ] 

Folke Behrens commented on CASSANDRA-1214:
------------------------------------------

I meant that com.sun.jna.Native would be a hard runtime dependency of FBUtilities. (NoClassDefFoundError != ClassNotFoundException) Cassandra wouldn't start without JNA, or did I miss something?

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899256#action_12899256 ] 

Todd Lipcon commented on CASSANDRA-1214:
----------------------------------------

AFAIK JNA is LGPL and thus incompatible with Apache 2 license. I've wanted to use it in other ASF projects, too, and it's a pain there isn't a Apache-licensed alternative. If some of the Cassandra people are interested in a cleanroom implementation, I'd be interested in helping, though!

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899482#action_12899482 ] 

Todd Lipcon commented on CASSANDRA-1214:
----------------------------------------

I don't think it's kosher to pull in LGPL as a build dependency with ivy either - in Hadoop we dynamically linked some JNI against LZO (LGPL licensed) but it was decided even that was not allowed, so we had to move the entire LZO support out to github.

Regarding the FD issue, although reflecting out the FD field isn't that portable, I've seen it done in an awful lot of places, so I don't think it's going to change any time soon. There's a patch in the works for Hadoop that adds some JNI calls for IO-related things, and we grab the fd field there. There's also an interface sun.misc.JavaIOFileDescriptorAccess which you can sneak out of sun.misc.SharedSecrets, if that makes you feel better than using reflection :)

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896626#action_12896626 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

Sounds good, with the caveat that it needs to catch the kind of error conditions I mentioned.

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: James Golick
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Make standard IO the default

Posted by "James Golick (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881226#action_12881226 ] 

James Golick commented on CASSANDRA-1214:
-----------------------------------------

I have tried many levels of swappiness (including 0) without any change in behaviour. Additionally, I haven't seen much if any change in performance with standard IO.

Continuing to iterate on the mmap code might be a good idea. But, it's the wrong default. Especially now that we've agreed that it is currently broken. It's possible that it may be a sensible default in the future, but right now, it's not a good choice for production (in most cases).

> Make standard IO the default
> ----------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7
>            Reporter: James Golick
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Folke Behrens (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899645#action_12899645 ] 

Folke Behrens commented on CASSANDRA-1214:
------------------------------------------

Wouldn't it then be better if tryMlockAll() loads another class with Class.forName() and catches the ClassNotFoundException if JNA jar is not on the classpath?

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1214:
--------------------------------------

    Fix Version/s: 0.7 beta 2

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.5, 0.7 beta 2
>
>         Attachments: 1214-v3.txt, 1214-v4.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1214:
--------------------------------------

    Attachment:     (was: 1214-v3.txt)

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899675#action_12899675 ] 

Jonathan Ellis commented on CASSANDRA-1214:
-------------------------------------------

That way we have to have a catch for CNFE in each method we're exposing, instead of just once in CLibrary.  

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: 1214-v3.txt, mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1214) Force linux to not swap the JVM

Posted by "Folke Behrens (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899269#action_12899269 ] 

Folke Behrens commented on CASSANDRA-1214:
------------------------------------------

Note that this is a political matter, not a legal one. It's against the ASF policy to distribute packages containing LGPL code. The licenses are compatible.

> Force linux to not swap the JVM
> -------------------------------
>
>                 Key: CASSANDRA-1214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1214
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: James Golick
>             Fix For: 0.6.5
>
>         Attachments: mlockall-jna.patch.txt, Read Throughput with mmap.jpg, trunk-1214.txt
>
>
> The way mmap()'d IO is handled in cassandra is dangerous. It allocates potentially massive buffers without any care for bounding the total size of the program's buffers. As the node's dataset grows, this *will* lead to swapping and instability.
> This is a dangerous and wrong default for a couple of reasons.
> 1) People are likely to test cassandra with the default settings. This issue is insidious because it only appears when you have sufficient data in a certain node, there is absolutely no way to control it, and it doesn't at all respect the memory limits that you give to the JVM.
> That can all be ascertained by reading the code, and people should certainly do their homework, but nevertheless, cassandra should ship with sane defaults that don't break down when you cross some magic unknown threshold.
> 2) It's deceptive. Unless you are extremely careful with capacity planning, you will get bit by this. Most people won't really be able to use this in production, so why get them excited about performance that they can't actually have?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.