You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sandeep Tata (JIRA)" <ji...@apache.org> on 2009/03/16 20:41:50 UTC

[jira] Created: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Cassandra silently loses data when a single row gets large
----------------------------------------------------------

                 Key: CASSANDRA-7
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
             Project: Cassandra
          Issue Type: Bug
         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
            Reporter: Sandeep Tata
            Priority: Critical


When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.

Say each value is 1MB large, 

insert("row", "col0", value, timestamp)
insert("row", "col1", value, timestamp)
insert("row", "col2", value, timestamp)
...
...
insert("row", "col100", value, timestamp)

Running: 
get_column("row", "col0")
get_column("row", "col1")
...
..
get_column("row", "col100")

The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
I will attach a small program that will help you reproduce this. 

1. This only happens when the cumulative size of the row exceeds several megabytes. 
2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.






-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Neophytos Demetriou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682482#action_12682482 ] 

Neophytos Demetriou commented on CASSANDRA-7:
---------------------------------------------

(a) It happens when you insert a large number of columns in a single row
(b) Cassandra silently loses some of these inserts (batch inserts are also inserts). 
(c) This DOES happen when the threshold is violated (the cumulative size is only one of the reasons for the threshold to be violated)
(d) It is also while flushing the memtable to disk.

Yes, I can open a new ticket but it seemed relevant to this issue.

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-7:
--------------------------------------

    Assignee: Sandeep Tata

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata resolved CASSANDRA-7.
----------------------------------

    Resolution: Fixed

Fixed by svn commit: r756155

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682437#action_12682437 ] 

Sandeep Tata commented on CASSANDRA-7:
--------------------------------------

Another way to check that this is really a bug is to list the columns in the serialized SSTable. You will notice a large contiguous range of missing columns. The trunk does not have a "show SSTable" utility -- mine depends on a bunch of other code, I'll try and put one up soon.

Opening up the SSTable in a binary file viewer might be enough -- you'll see a large swath of zeroes in the middle where real data should be.



> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Neophytos Demetriou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682473#action_12682473 ] 

Neophytos Demetriou commented on CASSANDRA-7:
---------------------------------------------

The issue still persists. I remember sending a bug report about this (old repository). IIRC, the issue was that the ColumnIndexer would raise a java.util.ConcurrentModificationException when it tried to read the sortedSet_ from EfficientBidiMap while flushing. No exception is raised using the new repository  you still get zero-size Data files when you batch insert lots of simple and super columns in the same row without any throttle. 

The basic test I am using for this is as follows: I have a collection of 100000 content items. For each of the content items I batch insert (in the same row) 5000-10000 supercolumns each time for a total of about  a million different supercolumn names before I get the first zero-sized data file.  I have tried this with different MemtableSizeInMB, MemTableObjectCountInMillions (also tried with a smaller threshold), MemtableLifetimeInDays, and with your patch. Always, the same behavior (except when I use a throttle).

PS. The old repository had an issue with addColumn in ColumnFamily.java but that was not it, see: 
http://groups.google.com/group/cassandra-user/browse_thread/thread/329b7700ebda3072/2da827df755eb168

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682479#action_12682479 ] 

Sandeep Tata commented on CASSANDRA-7:
--------------------------------------

Neo:

This issue only deals with the serialization bug while flushing the memtable to disk. It has nothing to do with batch_insert, or supercolumn sizes, or thread-unsafe access of the sortedSet in EfficientBidiMap. 

Can you open a new ticked and attach a driver program that might help us reproduce the problem you're describing?

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-7:
---------------------------------

    Attachment: dirty_bit_patch_v2.txt

Also patches the element comment to say :

/* Write at most "len" bytes *from* "b" starting at .... */

Thanks to jbellis for pointing this out!

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-7:
---------------------------------

    Attachment: BigReadWriteTest.java

This program simply writes a bunch of data first and tries to read it all back.
If the write phase spans multiple SSTables, you will notice that the read phase fails with missing values.
The "--numColumns" needs to be large enough -- try something like 6000. It might take a couple of minutes to run the test.





> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-7:
---------------------------------

    Attachment: dirty_bit_patch.txt

This is a patch to the data-loss bug.

This seemingly simple fix caused a whole lot of headache :-)

The bug is in io.BufferedRandomAccess.writeAtMost

If it takes more than 2 invocations to write out the data, except for the first and last calls, the data is not flushed out to disk because the dirty bit is not set correctly.

BufferedRandomAccess.write(byte[] b, int off, int len) will work correctly if 1 or 2 iterations happen in the loop because the bit is set to true before first invocation and the bit is set to true in the end. The fix simply sets this bit to true after the call to  System.arraycopy.


> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.