You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sandeep Tata (JIRA)" <ji...@apache.org> on 2009/03/16 20:41:50 UTC

[jira] Created: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Cassandra silently loses data when a single row gets large
----------------------------------------------------------

Key: CASSANDRA-7
URL: https://issues.apache.org/jira/browse/CASSANDRA-7
Project: Cassandra
Issue Type: Bug
Environment: code in trunk, Red Hat 4.1.2-33, Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
Reporter: Sandeep Tata
Priority: Critical

When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.

Say each value is 1MB large,

insert("row", "col0", value, timestamp)
insert("row", "col1", value, timestamp)
insert("row", "col2", value, timestamp)
...
...
insert("row", "col100", value, timestamp)

Running:
get_column("row", "col0")
get_column("row", "col1")
...
..
get_column("row", "col100")

The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
I will attach a small program that will help you reproduce this.

1. This only happens when the cumulative size of the row exceeds several megabytes.
2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-7) Cassandra silently loses data when a single row gets large

Posted by "Neophytos Demetriou (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682482#action_12682482 ] 

Neophytos Demetriou commented on CASSANDRA-7:
---------------------------------------------

(a) It happens when you insert a large number of columns in a single row
(b) Cassandra silently loses some of these inserts (batch inserts are also inserts). 
(c) This DOES happen when the threshold is violated (the cumulative size is only one of the reasons for the threshold to be violated)
(d) It is also while flushing the memtable to disk.

Yes, I can open a new ticket but it seemed relevant to this issue.

> Cassandra silently loses data when a single row gets large
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: code in trunk, Red Hat 4.1.2-33,  Linux version 2.6.23.1-42.fc8, java version "1.7.0-nio2"
>            Reporter: Sandeep Tata
>            Priority: Critical
>         Attachments: BigReadWriteTest.java, dirty_bit_patch.txt, dirty_bit_patch_v2.txt
>
>
> When you insert a large number of columns in a single row, Cassandra silently loses some of these inserts.
> This does not happen until the cumulative size of the columns in a single row exceeds several megabytes.
> Say each value is 1MB large, 
> insert("row", "col0", value, timestamp)
> insert("row", "col1", value, timestamp)
> insert("row", "col2", value, timestamp)
> ...
> ...
> insert("row", "col100", value, timestamp)
> Running: 
> get_column("row", "col0")
> get_column("row", "col1")
> ...
> ..
> get_column("row", "col100")
> The sequence of get_columns will fail at some point before 100. This was a problem with the old code in code.google also.
> I will attach a small program that will help you reproduce this. 
> 1. This only happens when the cumulative size of the row exceeds several megabytes. 
> 2. In fact, the single row should be large enough to trigger an SSTable flush to trigger this error.
> 3. No OutOfMemory errors are thrown, there is nothing relevant in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.