You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tey Kar Shiang (JIRA)" <ji...@apache.org> on 2011/03/29 09:17:05 UTC

[jira] [Created] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
-------------------------------------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-2401
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.4
         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
            Reporter: Tey Kar Shiang


ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.

The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?

About my test:
A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).

No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Roland Gude (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021583#comment-13021583 ] 

Roland Gude commented on CASSANDRA-2401:
----------------------------------------

i just looked a little closer at your index expressions again.
If i understand them correctly they are subject to https://issues.apache.org/jira/browse/CASSANDRA-2347
Although i don't really think it is the issue you are describing it would be nice if you could apply the patch and see if the error still occurs.

You are creating the bytebuffers for author_id and file_type in a different way. Is this a mistake? 

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029394#comment-13029394 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

bq. ignoreObsoleteMutation() now forgot to actually remove the obsolete mutation from cf

this isn't actually necessary, though, since if it's taken out of the list of mutated index columns the obsolete columns will only be applied to the "main" data row, and including obsolete columns there is harmless.

bq. not sure why mutatedIndexColumns need to be concurrent

because we might remove from the collection while iterating over it.  treeset will throw concurrentmodificationexception.  but maybe iterator.remove would work, now that you mention it?

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021394#comment-13021394 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Hi Jon,

Allow me to add more information:

Each simulated user thread will do the following in repeatitive manner:

loop = 0;
while( running )
{
    if( loop % 5 ==0 ) { list all files in folder; }

    create around 4~10 files but cap the total files around 2000 files only.
    modified around 20 files;
    delete 1~4 files;

    loop ++;
}

The "list all files in folder" is the scan action, where it will later for 1 or 2 users giving us "no file" in return after the next few days when restarted the same test, without resetting data. Found out it is due to the issue above. 

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

    Attachment:     (was: 2401-v3.txt)

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012821#comment-13012821 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Hi,

New finding here:
For the 0-column data, it is because it is never read from the file. As I step through the line, here it returns -1 position from org.apache.cassandra.io.sstable.SSTableReader.java::getPosition(DecoratedKey decoratedKey, Operator op), line 448 (bf.isPresent(decoratedKey.key) is returning false) - key is missing.

There seem to be a missing record which is indexed or indexed column itself not updated when the record is removed (?). 

As for the data return with 0-column, simply because a container is always created (final ColumnFamily returnCF = ColumnFamily.create(metadata)) and returned from getTopLevelColumns even if there is no read taken.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029366#comment-13029366 ] 

Sylvain Lebresne commented on CASSANDRA-2401:
---------------------------------------------

Comments on the patch:
  * ignoreObsoleteMutation() now forgot to actually remove the obsolete mutation from cf.
  * not sure why mutatedIndexColumns need to be concurrent. There is no concurrency in ignoreObsoleteMutation, is there ?
  * really minor: change to debug log "Scanning index row %s ..." seems misleading since the first argument is not a row name.

Other than that, I do agree with you that there is quite probably a race between reads and concurrent writes. But also agree that it doesn't seem to be the problem here

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

    Attachment: 2401-v3.txt

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015932#comment-13015932 ] 

Tey Kar Shiang edited comment on CASSANDRA-2401 at 4/5/11 2:24 PM:
-------------------------------------------------------------------

hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.
<pre>
03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db
03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db
03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db
03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db
03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db
03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db
03/30/2011  05:37 PM               270 FileMap-f-54-Index.db
03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db
04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db
04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db
04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db
04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db
04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db
04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db
03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db
03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db
03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db
03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db
03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db
03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db
04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db
04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db
04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db
03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db
03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db
03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db
03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db
03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db
03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db
04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db
04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db
04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db
04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db
04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});

</pre>
In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".

      was (Author: karshiang):
    hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.

03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db
03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db
03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db
03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db
03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db
03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db
03/30/2011  05:37 PM               270 FileMap-f-54-Index.db
03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db
04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db
04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db
04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db
04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db
04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db
04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db
03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db
03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db
03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db
03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db
03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db
03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db
04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db
04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db
04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db
03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db
03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db
03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db
03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db
03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db
03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db
04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db
04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db
04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db
04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db
04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});


In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012475#comment-13012475 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

Are you querying for zero columns?

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030446#comment-13030446 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

kill -9 w/o the bootstrap is not sufficient to cause the problem?

If you allow the bootstrap to finish does it work correctly if you kill -9 node A?

Bootstrap shouldn't cause anything to be written to node A (except the presence of a new node, to system table) so I'm inclined to think the kill -9 of A is the important part.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021391#comment-13021391 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

hi Jon,

sorry for less updates for past few days as we were busy on other tasks. We are thinking to stress-test with 0.7.5 when it is out.

In the test, we have all operations e.g. "insert, replace, and delete". If not wrong, we have simulated 20-users to run concurrently, however, they likely not able to delete key of different user. I think there is no such a case when 1 user is modifying his record, when another user deleting the record.

There is no expiring columns (TTL) in this test.

Same data on another PC will able to give the same exception, though we found the index position (n variable) can be shifted by 1 or 2.

Thanks, Jon and your team for the gd work!

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015932#comment-13015932 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.

03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db
03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db
03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db
03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db
03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db
03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db
03/30/2011  05:37 PM               270 FileMap-f-54-Index.db
03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db
04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db
04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db
04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db
04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db
04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db
04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db
03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db
03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db
03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db
03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db
03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db
03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db
04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db
04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db
04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db
03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db
03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db
03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db
03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db
03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db
03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db
04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db
04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db
04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db
04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db
04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});


In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

          Component/s: Core
    Affects Version/s:     (was: 0.7.4)
                       0.7.0
        Fix Version/s: 0.7.6
             Assignee: Jonathan Ellis

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jackson Chung (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029005#comment-13029005 ] 

Jackson Chung commented on CASSANDRA-2401:
------------------------------------------

I have an existing data that was resulting similar NPE  before the patch. After applying the patch, the following observed:

{noformat}
DEBUG [ReadStage:82] 2011-05-04 21:23:27,114 ColumnFamilyStore.java (line 1514) fetched data row ColumnFamily(inode -deleted at 1304363600008- [70617468:false:49@1304363600219,])
DEBUG [ReadStage:82] 2011-05-04 21:23:27,114 ColumnFamilyStore.java (line 1532) row ColumnFamily(inode -deleted at 1304363600008- [70617468:false:49@1304363600219,]) satisfies all clauses
DEBUG [ReadStage:82] 2011-05-04 21:23:27,115 ColumnFamilyStore.java (line 1514) fetched data row ColumnFamily(inode [70617468:false:10@1304353355296,])
ERROR [ReadStage:82] 2011-05-04 21:23:27,115 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[ReadStage:82,5,main]
java.lang.AssertionError: No data found for NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17]) in DecoratedKey(29842926756667498147838693957802723793, 3134346637326336393966396130336561376538623330316566383561616131):QueryPath(columnFamilyName='inode', superColumnName='null', columnName='null') (original filter NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17])) from expression 73656e74696e656cEQ78
    at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1512)
    at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
{noformat}

was the fix intend to avoid future problem, as such existing problem would need a workaround solution?

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015816#comment-13015816 ] 

Tey Kar Shiang edited comment on CASSANDRA-2401 at 4/6/11 5:53 AM:
-------------------------------------------------------------------

Hi Roland,

Sure, as we are trying to do that. In the mean time, I would like to update you more about our findings:
We built a test case on the PC with the existing DB and to produce same issue, without hector API. The test case works (able to create null exception) on the original PC. 

java.lang.NullPointerException
	at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore.java:1787)
	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1727)
	at TestScan.main(TestScan.java:74)

line 1787: IColumn column = data.getColumn(expression.column_name); where data is NULL


Zipping the 0.7.4 cassandra data to another new PC gives the same issue, but the missing key order may slightly different, e.g. on original PC it is at 430th, on the new PC it is 431th. Both keys appears to be same though (content in ByteBuffer).
(Edited: the new PC also found the problem - which makes more sense)

We will continue to check if it is due to the "if (column.isMarkedForDelete())" is not working on the PC with have the null encountered. Since we checked that, both PCs have the same number of columns returned in "scan" method at line "ColumnFamily indexRow = indexCFS.getColumnFamily(indexFilter);", where "indexRow.getColumnCount()" both giving 1996, with some rows already deleted as tombstones. 


      was (Author: karshiang):
    Hi Roland,

Sure, as we are trying to do that. In the mean time, I would like to update you more about our findings:
We built a test case on the PC with the existing DB and to produce same issue, without hector API. The test case works (able to create null exception) on the original PC. However, if we zip the 0.7.4 cassandra data to another new PC, running the same code will not see the null. 

java.lang.NullPointerException
	at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore.java:1787)
	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1727)
	at TestScan.main(TestScan.java:74)

line 1787: IColumn column = data.getColumn(expression.column_name); where data is NULL


We will continue to check if it is due to the "if (column.isMarkedForDelete())" is not working on the PC with have the null encountered. Since we checked that, both PCs have the same number of columns returned in "scan" method at line "ColumnFamily indexRow = indexCFS.getColumnFamily(indexFilter);", where "indexRow.getColumnCount()" both giving 1996, with some rows already deleted as tombstones. 

  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015816#comment-13015816 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Hi Roland,

Sure, as we are trying to do that. In the mean time, I would like to update you more about our findings:
We built a test case on the PC with the existing DB and to produce same issue, without hector API. The test case works (able to create null exception) on the original PC. However, if we zip the 0.7.4 cassandra data to another new PC, running the same code will not see the null. 

java.lang.NullPointerException
	at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore.java:1787)
	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1727)
	at TestScan.main(TestScan.java:74)

line 1787: IColumn column = data.getColumn(expression.column_name); where data is NULL


We will continue to check if it is due to the "if (column.isMarkedForDelete())" is not working on the PC with have the null encountered. Since we checked that, both PCs have the same number of columns returned in "scan" method at line "ColumnFamily indexRow = indexCFS.getColumnFamily(indexFilter);", where "indexRow.getColumnCount()" both giving 1996, with some rows already deleted as tombstones. 


> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030447#comment-13030447 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

bq. Bootstrap shouldn't cause anything to be written to node A

Hmm, but it does cause A to flush. I wonder if that's the connection.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tyler Hobbs (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tyler Hobbs reopened CASSANDRA-2401:
------------------------------------


With these changes, using a count of 0 in the SlicePredicate produces the following AssertionError (and a TimedOutExc for the client):

{noformat}
ERROR 16:13:38,864 Fatal exception in thread Thread[ReadStage:16,5,main]
java.lang.AssertionError: No data found for SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, count=0] in DecoratedKey(81509516161424251288255223397843705139, 6b657931):QueryPath(columnFamilyName='cf', superColumnName='null', columnName='null') (original filter SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, count=0]) from expression 'cf.626972746864617465 EQ 1'
	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1517)
	at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
{noformat}

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

    Comment: was deleted

(was: Oops, that's actually column + value, not CF.

v3 adds CF:

{noformat}
Scanning index 'CF1.LongIdxName.world2 EQ 15' starting with
{noformat})

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030451#comment-13030451 ] 

Jonathan Ellis edited comment on CASSANDRA-2401 at 5/8/11 3:15 AM:
-------------------------------------------------------------------

Another thing to try: after kill -9 of A but before restarting it, remove the commitlog *header* files (just the header ones). This should force full CL replay on restart.

      was (Author: jbellis):
    Another thing to try: after kill -9 of A, remove the commitlog *header* files (just the header ones). This should force full CL replay.
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012542#comment-13012542 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Hi, nope. 

It is a query for 4 columns. 

I cheked that only 1 row has this problem (no column found), out of the 948 records returned; I skipped the row with zero columns. 

In my stress-test, all rows have 4 columns; i.e. row is the file, the 4 columns (index) are like its version, modified time, type, etc. I added all the columns when added each file. The addition should be working since there is no such exception on day 1, and I start and stop the stress tests until each users have around 1500 files. Row with 0 column only found on the 4th day after I continue to run it.

I will keep picking up cassandra logics, as I have little understanding about how data loaded, stored and deleted. Any suggestion / guide on how I should go on with my study is greatly appreciated. Thank you!

Btw, for this test, I have not yet going to 2 nodes / 3 nodes. It is only a single-node cassandra runnning on my localhost.


> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030451#comment-13030451 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

Another thing to try: after kill -9 of A, remove the commitlog *header* files (just the header ones). This should force full CL replay.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tyler Hobbs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033331#comment-13033331 ] 

Tyler Hobbs edited comment on CASSANDRA-2401 at 5/13/11 9:54 PM:
-----------------------------------------------------------------

With these changes, using a count of 0 in the SlicePredicate produces the following AssertionError (and a TimedOutExc for the client):

{noformat}
ERROR 16:13:38,864 Fatal exception in thread Thread[ReadStage:16,5,main]
java.lang.AssertionError: No data found for SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, count=0] in DecoratedKey(81509516161424251288255223397843705139, 6b657931):QueryPath(columnFamilyName='cf', superColumnName='null', columnName='null') (original filter SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, count=0]) from expression 'cf.626972746864617465 EQ 1'
	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1517)
	at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
{noformat}

This was during a get_indexed_slices().

      was (Author: thobbs):
    With these changes, using a count of 0 in the SlicePredicate produces the following AssertionError (and a TimedOutExc for the client):

{noformat}
ERROR 16:13:38,864 Fatal exception in thread Thread[ReadStage:16,5,main]
java.lang.AssertionError: No data found for SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, count=0] in DecoratedKey(81509516161424251288255223397843705139, 6b657931):QueryPath(columnFamilyName='cf', superColumnName='null', columnName='null') (original filter SliceQueryFilter(start=java.nio.HeapByteBuffer[pos=10 lim=10 cap=30], finish=java.nio.HeapByteBuffer[pos=17 lim=17 cap=30], reversed=false, count=0]) from expression 'cf.626972746864617465 EQ 1'
	at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1517)
	at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
{noformat}
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Roland Gude (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015193#comment-13015193 ] 

Roland Gude commented on CASSANDRA-2401:
----------------------------------------

can you provide some unit tests that reproduce your error? i'd like to look into it, but i am not sure whether i understand the issue correctly.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032542#comment-13032542 ] 

Sylvain Lebresne commented on CASSANDRA-2401:
---------------------------------------------

>From irc:
{noformat}
pcmanus : jbellis: do you know what's up with #2401 ?
jbellis : jackson can't reproduce anymore either, but he wants to test more before calling it fixed
{noformat}
So I'm going to mark this resolved as this fixed a legit bug and I don't want to push it 0.7.7.
If there is still related problems, let's open another ticket.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029446#comment-13029446 ] 

Sylvain Lebresne commented on CASSANDRA-2401:
---------------------------------------------

+1 v2

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2401.
---------------------------------------

    Resolution: Fixed

Created CASSANDRA-2653 to address this, since it will probably be in a different release than the original 2401 fix.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030447#comment-13030447 ] 

Jonathan Ellis edited comment on CASSANDRA-2401 at 5/8/11 2:42 AM:
-------------------------------------------------------------------

bq. Bootstrap shouldn't cause anything to be written to node A

Hmm, but it does cause A to flush. I wonder if that's the connection.

Can you try with invoking nodetool flush against A, instead of doing a bootstrap?

      was (Author: jbellis):
    bq. Bootstrap shouldn't cause anything to be written to node A

Hmm, but it does cause A to flush. I wonder if that's the connection.
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021382#comment-13021382 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

So when you created the data,

- you did only inserts (no overwrites or deletes)
- you did not use any expiring columns (TTL)

Correct?

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028857#comment-13028857 ] 

Jonathan Ellis edited comment on CASSANDRA-2401 at 5/4/11 5:50 PM:
-------------------------------------------------------------------

I found *a* bug that could cause this: Cassandra will re-create a deleted index entry if it gets a write with an obsolete timestamp, but the data row tombstone will correctly suppress an update there. (So when you do an index query for value=X, and the index says "row K has that value," then you get an error trying to read row K that doesn't exist.)

I don't think this is the bug Tey Kar is hitting, though, because unless I'm mistaken you won't get this NPE until after the data row tombstone is removed by compaction after gc_grace_seconds.  4 days isn't enough to see that unless you've tweaked gc_g_s.

Still, it's worth fixing.  Patch attached.  (Also adds an assert w/ more information if/when another way of triggering this is found.)

      was (Author: jbellis):
    I found *a* bug that could cause this: Cassandra will re-create a deleted index entry if it gets a write with an obsolete timestamp, but the data row tombstone will correctly suppress an update there.

I don't think this is the bug Tey Kar is hitting, though, because unless I'm mistaken you won't get this NPE until after the data row tombstone is removed by compaction after gc_grace_seconds.  4 days isn't enough to see that unless you've tweaked gc_g_s.

Still, it's worth fixing.  Patch attached.  (Also adds an assert w/ more information if/when another way of triggering this is found.)
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029477#comment-13029477 ] 

Hudson commented on CASSANDRA-2401:
-----------------------------------

Integrated in Cassandra-0.7 #470 (See [https://builds.apache.org/hudson/job/Cassandra-0.7/470/])
    improve ignoring of obsoletemutations in index maintenance
patch by jbellis; reviewed by slebresne for CASSANDRA-2401


> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Roland Gude (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015876#comment-13015876 ] 

Roland Gude commented on CASSANDRA-2401:
----------------------------------------

Sounds As if the Index is still pointing to deleted entriss. 

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

    Attachment: 2401.txt

I found *a* bug that could cause this: Cassandra will re-create a deleted index entry if it gets a write with an obsolete timestamp, but the data row tombstone will correctly suppress an update there.

I don't think this is the bug Tey Kar is hitting, though, because unless I'm mistaken you won't get this NPE until after the data row tombstone is removed by compaction after gc_grace_seconds.  4 days isn't enough to see that unless you've tweaked gc_g_s.

Still, it's worth fixing.  Patch attached.  (Also adds an assert w/ more information if/when another way of triggering this is found.)

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015819#comment-13015819 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Some information i missed in update:
In the PC with NULL exception, I do a continue when found "data" is null, and ignore that. I will get 1040 columns returned. On the 2nd (new) PC, without the NULL exception nor additional code to bypass null data, it is getting 1040 records as well. From here, we will study more our DB to find out where it went wrong/different.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029422#comment-13029422 ] 

Sylvain Lebresne commented on CASSANDRA-2401:
---------------------------------------------

bq. this isn't actually necessary, though, since if it's taken out of the list of mutated index columns the obsolete columns will only be applied to the "main" data row, and including obsolete columns there is harmless.

Very true.

bq. but maybe iterator.remove would work

I think it will

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

    Attachment: 2401-v2.txt

v2 attached w/ iterator/Set change.

bq. change to debug log "Scanning index row %s ..." seems misleading since the first argument is not a row name

it actually is the same CF+row as before, I just encapsulated it in getExpressionString so I can re-use the method in case of assertion failure later. Tweaked format a bit in v2, here's an example debug output:

{noformat}
Scanning index 'world2 EQ 15' starting with
{noformat}

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012547#comment-13012547 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

Is there any data from earlier than 0.7.4?

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021387#comment-13021387 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

The more I think about it, the more I think that there is a rare race condition here -- we do a kind of row lock during updates of indexed data, but we do not lock during reads. So it's possible for an index read to say "row X has this value" and then have that value deleted (by another client's request) before we can read row X.

BUT that does not look like what you are seeing because if I understand correctly you are seeing that the index has permanently missed a delete operation.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021955#comment-13021955 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

for issue: https://issues.apache.org/jira/browse/CASSANDRA-2347, I suspect we encountered that in another case. It has a validation failure at times.

I applied the change. As expected, the error is still there, data is missing or indexed key extra then throw NULL exception out.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021941#comment-13021941 ] 

Tey Kar Shiang edited comment on CASSANDRA-2401 at 4/20/11 3:53 AM:
--------------------------------------------------------------------

Hi Roland,

The 'mistake' is intended by reusing some Hector API code.

Hector has a Integer Serializer, which will generate 4-byte[] from given integer. The file_type is a 1-byte array. It is to produce exact effected client call into a test case, solely running cassandra. 

      was (Author: karshiang):
    Hi Roland,

The 'mistake' is sort of intended as according to Hector API, the way we use it.

Hector has a Integer Serializer, which will generate 4-byte[] from given integer. The file_type is a 1-byte array, as when i try to duplicated it without Hector. I just tried duplicate the exact effected client code into a test case solely running cassandra. 
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012804#comment-13012804 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Hi, 
This is a clean 0.7.4 setup, with zero data to start with. Dynamically, the keyspace schema is creted on the run, when required keyspace does not exist.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021941#comment-13021941 ] 

Tey Kar Shiang commented on CASSANDRA-2401:
-------------------------------------------

Hi Roland,

The 'mistake' is sort of intended as according to Hector API, the way we use it.

Hector has a Integer Serializer, which will generate 4-byte[] from given integer. The file_type is a 1-byte array, as when i try to duplicated it without Hector. I just tried duplicate the exact effected client code into a test case solely running cassandra. 

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015932#comment-13015932 ] 

Tey Kar Shiang edited comment on CASSANDRA-2401 at 4/6/11 10:15 AM:
--------------------------------------------------------------------

hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.

{{03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db}}
{{03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db}}
{{03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db}}
{{03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db}}
{{03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db}}
{{03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db}}
{{03/30/2011  05:37 PM               270 FileMap-f-54-Index.db}}
{{03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db}}
{{04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db}}
{{04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db}}
{{04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db}}
{{04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db}}
{{04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db}}
{{04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db}}
{{04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db}}
{{04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db}}
{{03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db}}
{{03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db}}
{{03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db}}
{{03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db}}
{{03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db}}
{{03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db}}
{{03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db}}
{{03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db}}
{{04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db}}
{{04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db}}
{{04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db}}
{{04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db}}
{{03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db}}
{{03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db}}
{{03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db}}
{{03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db}}
{{03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db}}
{{03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db}}
{{03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db}}
{{03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db}}
{{04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db}}
{{04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db}}
{{04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db}}
{{04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db}}
{{04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db}}
{{04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db}}
{{04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db}}
{{04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db}}

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});

}}
In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".

      was (Author: karshiang):
    hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.
{{
03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db
03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db
03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db
03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db
03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db
03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db
03/30/2011  05:37 PM               270 FileMap-f-54-Index.db
03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db
04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db
04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db
04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db
04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db
04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db
04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db
03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db
03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db
03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db
03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db
03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db
03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db
04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db
04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db
04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db
03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db
03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db
03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db
03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db
03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db
03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db
04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db
04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db
04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db
04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db
04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});

}}
In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021382#comment-13021382 ] 

Jonathan Ellis edited comment on CASSANDRA-2401 at 4/19/11 2:49 AM:
--------------------------------------------------------------------

So when you created the data, you did not use any expiring columns (TTL), correct?

      was (Author: jbellis):
    So when you created the data,

- you did only inserts (no overwrites or deletes)
- you did not use any expiring columns (TTL)

Correct?
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015932#comment-13015932 ] 

Tey Kar Shiang edited comment on CASSANDRA-2401 at 4/6/11 10:13 AM:
--------------------------------------------------------------------

hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.
{{
03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db
03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db
03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db
03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db
03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db
03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db
03/30/2011  05:37 PM               270 FileMap-f-54-Index.db
03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db
04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db
04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db
04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db
04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db
04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db
04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db
03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db
03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db
03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db
03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db
03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db
03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db
04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db
04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db
04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db
03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db
03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db
03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db
03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db
03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db
03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db
04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db
04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db
04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db
04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db
04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});

}}
In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".

      was (Author: karshiang):
    hi, yes. it seems to me so. Here, we create a table "FileMap", in which we store columns e.g. "content", "authorID", "Version", "Modified Time", "File Type", etc. Among them, sorted indices are "authorID" (as UserIndex), "File Type", "Modified Time", and "CassType"; where CassType means generally 'file type' here in our case. It is not used though.
<pre>
03/30/2011  09:34 AM        11,366,878 FileMap-f-53-Data.db
03/30/2011  09:34 AM            78,496 FileMap-f-53-Filter.db
03/30/2011  09:34 AM           735,930 FileMap-f-53-Index.db
03/30/2011  09:34 AM             4,264 FileMap-f-53-Statistics.db
03/30/2011  05:37 PM             4,055 FileMap-f-54-Data.db
03/30/2011  05:37 PM                40 FileMap-f-54-Filter.db
03/30/2011  05:37 PM               270 FileMap-f-54-Index.db
03/30/2011  05:37 PM             4,264 FileMap-f-54-Statistics.db
04/04/2011  04:07 PM            24,068 FileMap-f-55-Data.db
04/04/2011  04:07 PM               200 FileMap-f-55-Filter.db
04/04/2011  04:07 PM             1,746 FileMap-f-55-Index.db
04/04/2011  04:07 PM             4,264 FileMap-f-55-Statistics.db
04/04/2011  04:07 PM           961,808 FileMap.CassTypeIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.CassTypeIndex-f-53-Filter.db
04/04/2011  04:07 PM                11 FileMap.CassTypeIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.CassTypeIndex-f-53-Statistics.db
03/29/2011  02:52 PM           961,386 FileMap.FileTypeIndex-f-50-Data.db
03/29/2011  02:52 PM             1,936 FileMap.FileTypeIndex-f-50-Filter.db
03/29/2011  02:52 PM                11 FileMap.FileTypeIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.FileTypeIndex-f-50-Statistics.db
03/30/2011  05:37 PM               404 FileMap.FileTypeIndex-f-51-Data.db
03/30/2011  05:37 PM                16 FileMap.FileTypeIndex-f-51-Filter.db
03/30/2011  05:37 PM                11 FileMap.FileTypeIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.FileTypeIndex-f-51-Statistics.db
04/04/2011  04:07 PM             2,358 FileMap.FileTypeIndex-f-52-Data.db
04/04/2011  04:07 PM                16 FileMap.FileTypeIndex-f-52-Filter.db
04/04/2011  04:07 PM                11 FileMap.FileTypeIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.FileTypeIndex-f-52-Statistics.db
03/29/2011  02:52 PM         3,298,947 FileMap.ModifiedIndex-f-50-Data.db
03/29/2011  02:52 PM            78,016 FileMap.ModifiedIndex-f-50-Filter.db
03/29/2011  02:52 PM           731,106 FileMap.ModifiedIndex-f-50-Index.db
03/29/2011  02:52 PM             4,264 FileMap.ModifiedIndex-f-50-Statistics.db
03/30/2011  05:37 PM             2,065 FileMap.ModifiedIndex-f-51-Data.db
03/30/2011  05:37 PM                64 FileMap.ModifiedIndex-f-51-Filter.db
03/30/2011  05:37 PM               450 FileMap.ModifiedIndex-f-51-Index.db
03/30/2011  05:37 PM             4,264 FileMap.ModifiedIndex-f-51-Statistics.db
04/04/2011  04:07 PM            13,835 FileMap.ModifiedIndex-f-52-Data.db
04/04/2011  04:07 PM               328 FileMap.ModifiedIndex-f-52-Filter.db
04/04/2011  04:07 PM             3,006 FileMap.ModifiedIndex-f-52-Index.db
04/04/2011  04:07 PM             4,264 FileMap.ModifiedIndex-f-52-Statistics.db
04/04/2011  04:07 PM           962,874 FileMap.UserIndex-f-53-Data.db
04/04/2011  04:07 PM             1,936 FileMap.UserIndex-f-53-Filter.db
04/04/2011  04:07 PM               420 FileMap.UserIndex-f-53-Index.db
04/04/2011  04:07 PM             4,264 FileMap.UserIndex-f-53-Statistics.db

In the search, we are using IndexClause as:
		ByteBuffer field_author = ByteBuffer.wrap(new byte[]{'a'});
		ByteBuffer author_1 = IntegerSerializer.get().toByteBuffer(1);
		
		ByteBuffer file_type = ByteBuffer.wrap(new byte[]{'t'});
		ByteBuffer filetype_3 = ByteBuffer.wrap(new byte[]{3}); //file type 3
		
		IndexClause indexClause = new IndexClause();
		indexClause.setCount(3000);
		ArrayList<IndexExpression> expressions = new ArrayList();
		expressions.add(new IndexExpression(field_author, IndexOperator.EQ, author_1)); //user ID = 1
		expressions.add(new IndexExpression(file_type, IndexOperator.EQ, filetype_3)); //file type = 3
		
		indexClause.setExpressions(expressions);
		indexClause.setStart_key(new byte[]{});

</pre>
In the search, it scans all the indices from "FileMap.UserIndex", within which there seems having a key (index) which is not found in the table "FileMap"; and I roughly get that it breaks at data retrieval with "FileMap-f-53-Data", when the position for the key is not found / available in "FileMap-f-53-Data".
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029021#comment-13029021 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

bq. was the fix intend to avoid future problem

yes.  as discussed above, once you corrupt your index this way the corruption is recorded permanently and you need to drop the index and recreate it.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2401:
--------------------------------------

    Attachment: 2401-v3.txt

Oops, that's actually column + value, not CF.

v3 adds CF:

{noformat}
Scanning index 'CF1.LongIdxName.world2 EQ 15' starting with
{noformat}

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029453#comment-13029453 ] 

Jonathan Ellis commented on CASSANDRA-2401:
-------------------------------------------

Oops, that's actually column + value, not CF.

For the record, v3 adds CF:
{noformat}
Scanning index 'CF1.world2 EQ 15' starting with
{noformat}

Will commit based on v2 +1.

> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Tey Kar Shiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012821#comment-13012821 ] 

Tey Kar Shiang edited comment on CASSANDRA-2401 at 3/30/11 2:19 AM:
--------------------------------------------------------------------

Hi,

New finding here:
For the 0-column data, it is because it is never read from the file. As I step through the line, here it returns -1 position from org.apache.cassandra.io.sstable.SSTableReader.java::getPosition(DecoratedKey decoratedKey, Operator op), line 448 (bf.isPresent(decoratedKey.key) is returning false) - key is missing.

There seem to be a missing record which is indexed or indexed column itself not updated when the record is removed (?). 

As for the data returned with 0-column, simply because a container is always created (final ColumnFamily returnCF = ColumnFamily.create(metadata)) and returned from getTopLevelColumns even if there is no read taken.

As for this case, it causes Timeout exception to Hector when null exception thrown without captured.

      was (Author: karshiang):
    Hi,

New finding here:
For the 0-column data, it is because it is never read from the file. As I step through the line, here it returns -1 position from org.apache.cassandra.io.sstable.SSTableReader.java::getPosition(DecoratedKey decoratedKey, Operator op), line 448 (bf.isPresent(decoratedKey.key) is returning false) - key is missing.

There seem to be a missing record which is indexed or indexed column itself not updated when the record is removed (?). 

As for the data return with 0-column, simply because a container is always created (final ColumnFamily returnCF = ColumnFamily.create(metadata)) and returned from getTopLevelColumns even if there is no read taken.
  
> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2401) getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query

Posted by "Jackson Chung (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030425#comment-13030425 ] 

Jackson Chung commented on CASSANDRA-2401:
------------------------------------------

Here is 1 way that i could 100% reproduce the issue with data being null:

Need 2 nodes, 1 is gonna to autobootstrap to the other. Also assuming completely clean start (blow up the /var/lib/cassandra/ or where ever data are stored

i am also using brisk beta to test

to start:
node-A:
1) get brisk
2) start brisk  with -t (jobtracker)
3) run a simple hive query : 
 3a) bin/brisk hive 
 3b) create table foo (bar INT);
 3c) select count(*) from foo;
 3d) exit;
4) every thing should be so far so good, let the brisk node continue to be up

node-B:
1) get brisk
2) modify the resources/cassandra/conf/cassandra.yaml:
 2a) to enable autobootstrap. 
 2b) point seeds to node-A

3) put a sleep or break point in o.a.c.service.StorageService.joinTokenRing method, right after "Map<InetAddress, Double> loadinfo = StorageLoadBalancer.instance.getLoadInfo();" (personal preference: log a sleep line, add a thread.sleep(a_long_time))
4) start brisk with -t on node-B 
5) wait till the log line "Joining: getting bootstrap token" , it should now reaches your break point (or zz)
6) crash the jvm (personal preference: kill -9 <pid>)

back to node-A
1) exit the jvm (BriskDaemon) "normally" (kill <pid>)
2) start the brisk node again (with -t):

log from node-A: 
{noformat}
 INFO 23:25:00,213 Logging initialized
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/riptano/work/brisk/resources/cassandra/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/riptano/work/brisk/resources/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
 INFO 23:25:00,235 Heap size: 510263296/511311872
 INFO 23:25:00,237 JNA not found. Native methods will be disabled.
 INFO 23:25:00,263 Loading settings from file:/home/riptano/work/brisk/resources/cassandra/conf/cassandra.yaml
 INFO 23:25:00,470 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO 23:25:00,496 Detected Hadoop trackers are enabled, setting my DC to Brisk
 INFO 23:25:00,696 Global memtable threshold is enabled at 162MB
 INFO 23:25:00,846 Opening /var/lib/cassandra/data/system/IndexInfo-f-1
 INFO 23:25:00,912 Opening /var/lib/cassandra/data/system/Schema-f-2
 INFO 23:25:00,926 Opening /var/lib/cassandra/data/system/Schema-f-1
 INFO 23:25:00,951 Opening /var/lib/cassandra/data/system/Migrations-f-2
 INFO 23:25:00,954 Opening /var/lib/cassandra/data/system/Migrations-f-1
 INFO 23:25:00,970 Opening /var/lib/cassandra/data/system/LocationInfo-f-2
 INFO 23:25:00,989 Opening /var/lib/cassandra/data/system/LocationInfo-f-1
 INFO 23:25:01,089 Loading schema version c4fd2440-7900-11e0-0000-ba846f9adcf7
 INFO 23:25:01,499 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1304810701499.log
 INFO 23:25:01,530 Replaying /var/lib/cassandra/commitlog/CommitLog-1304810455288.log
 INFO 23:25:01,675 Finished reading /var/lib/cassandra/commitlog/CommitLog-1304810455288.log
 INFO 23:25:01,730 Enqueuing flush of Memtable-MetaStore@102170028(869/1086 serialized/live bytes, 3 ops)
 INFO 23:25:01,735 Writing Memtable-MetaStore@102170028(869/1086 serialized/live bytes, 3 ops)
 INFO 23:25:01,743 Enqueuing flush of Memtable-sblocks@1075051425(3044096/3805120 serialized/live bytes, 17 ops)
 INFO 23:25:01,747 Enqueuing flush of Memtable-inode.path@780298059(2848/3560 serialized/live bytes, 59 ops)
 INFO 23:25:01,748 Enqueuing flush of Memtable-inode.sentinel@1934329031(2848/3560 serialized/live bytes, 59 ops)
 INFO 23:25:01,748 Enqueuing flush of Memtable-inode@1660575731(6393/7991 serialized/live bytes, 134 ops)
 INFO 23:25:01,821 Completed flushing /var/lib/cassandra/data/HiveMetaStore/MetaStore-f-1-Data.db (989 bytes)
 INFO 23:25:01,832 Writing Memtable-sblocks@1075051425(3044096/3805120 serialized/live bytes, 17 ops)
 INFO 23:25:01,927 Completed flushing /var/lib/cassandra/data/cfs/sblocks-f-1-Data.db (3045448 bytes)
 INFO 23:25:01,928 Writing Memtable-inode.path@780298059(2848/3560 serialized/live bytes, 59 ops)
 INFO 23:25:01,968 Completed flushing /var/lib/cassandra/data/cfs/inode.path-f-1-Data.db (5346 bytes)
 INFO 23:25:01,969 Writing Memtable-inode.sentinel@1934329031(2848/3560 serialized/live bytes, 59 ops)
 INFO 23:25:02,035 Completed flushing /var/lib/cassandra/data/cfs/inode.sentinel-f-1-Data.db (1735 bytes)
 INFO 23:25:02,036 Writing Memtable-inode@1660575731(6393/7991 serialized/live bytes, 134 ops)
 INFO 23:25:02,085 Completed flushing /var/lib/cassandra/data/cfs/inode-f-1-Data.db (8582 bytes)
 INFO 23:25:02,087 Log replay complete
 INFO 23:25:02,092 Cassandra version: 0.8.0-beta2-SNAPSHOT
 INFO 23:25:02,092 Thrift API version: 19.10.0
 INFO 23:25:02,092 Loading persisted ring state
 INFO 23:25:02,092 load token size: 0
 INFO 23:25:02,093 Starting up server gossip
 INFO 23:25:02,104 Enqueuing flush of Memtable-LocationInfo@22262475(29/36 serialized/live bytes, 1 ops)
 INFO 23:25:02,105 Writing Memtable-LocationInfo@22262475(29/36 serialized/live bytes, 1 ops)
 INFO 23:25:02,127 Completed flushing /var/lib/cassandra/data/system/LocationInfo-f-3-Data.db (80 bytes)
 INFO 23:25:02,149 Starting Messaging Service on port 7000
 INFO 23:25:02,172 Using saved token 152036150612811635197207268153837644139
 INFO 23:25:02,173 Enqueuing flush of Memtable-LocationInfo@1977026981(53/66 serialized/live bytes, 2 ops)
 INFO 23:25:02,174 Writing Memtable-LocationInfo@1977026981(53/66 serialized/live bytes, 2 ops)
 INFO 23:25:02,190 Completed flushing /var/lib/cassandra/data/system/LocationInfo-f-4-Data.db (163 bytes)
 INFO 23:25:02,193 Compacting Major: [SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-f-2-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-f-1-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-f-3-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/LocationInfo-f-4-Data.db')]
 INFO 23:25:02,196 Will not load MX4J, mx4j-tools.jar is not in the classpath
 INFO 23:25:02,196 Starting up Hadoop trackers
 INFO 23:25:02,197 Waiting for gossip to start
 INFO 23:25:02,225 Major@1830423861(system, LocationInfo, 438/741) now compacting at 16777 bytes/ms.
 INFO 23:25:02,257 Compacted to /var/lib/cassandra/data/system/LocationInfo-tmp-f-5-Data.db.  741 to 447 (~60% of original) bytes for 3 keys.  Time: 64ms.
 INFO 23:25:07,272 Chose seed 10.179.96.212 as jobtracker
 WARN 23:25:09,331 Metrics system not started: Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties, hadoop-metrics2.properties
 INFO 23:25:09,994 Chose seed 10.179.96.212 as jobtracker
 INFO 23:25:10,139 Updating the current master key for generating delegation tokens
 INFO 23:25:10,143 Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
 INFO 23:25:10,143 Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
 INFO 23:25:10,144 Updating the current master key for generating delegation tokens
 INFO 23:25:10,145 Refreshing hosts (include/exclude) list
 INFO 23:25:10,223 Starting jobtracker with owner as riptano
 INFO 23:25:10,245 Starting SocketReader
 INFO 23:25:10,374 Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
 INFO 23:25:10,623 Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
 INFO 23:25:10,673 Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50030
 INFO 23:25:10,673 listener.getLocalPort() returned 50030 webServer.getConnectors()[0].getLocalPort() returned 50030
 INFO 23:25:10,674 Jetty bound to port 50030
 INFO 23:25:10,674 jetty-6.1.21
 INFO 23:25:11,140 Started SelectChannelConnector@0.0.0.0:50030
 INFO 23:25:11,147 JobTracker up at: 8012
 INFO 23:25:11,147 JobTracker webserver: 50030
 WARN 23:25:11,276 Incorrect permissions on cassandra://localhost:9160/tmp/hadoop-riptano/mapred/system. Setting it to rwx------
ERROR 23:25:11,321 Fatal exception in thread Thread[ReadStage:4,5,main]
java.lang.AssertionError: No data found for NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17]) in DecoratedKey(55249227080490826413412398468829851220, 3165333533353736613164333836353061346636333465656437326131353939):QueryPath(columnFamilyName='inode', superColumnName='null', columnName='null') (original filter NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17])) from expression 'inode.73656e74696e656c EQ 78'
        at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1513)
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:46)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
 INFO 23:25:20,059 Deleted /var/lib/cassandra/data/system/LocationInfo-f-3
 INFO 23:25:20,060 Deleted /var/lib/cassandra/data/system/LocationInfo-f-4
 INFO 23:25:20,576 Deleted /var/lib/cassandra/data/system/LocationInfo-f-1
 INFO 23:25:20,577 Deleted /var/lib/cassandra/data/system/LocationInfo-f-2
 INFO 23:25:21,297 problem cleaning system directory: cassandra://localhost:9160/tmp/hadoop-riptano/mapred/system
java.io.IOException: TimedOutException()
        at org.apache.cassandra.hadoop.fs.CassandraFileSystemThriftStore.listDeepSubPaths(CassandraFileSystemThriftStore.java:523)
        at org.apache.cassandra.hadoop.fs.CassandraFileSystemThriftStore.listSubPaths(CassandraFileSystemThriftStore.java:529)
        at org.apache.cassandra.hadoop.fs.CassandraFileSystem.listStatus(CassandraFileSystem.java:171)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2374)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2174)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:303)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:294)
        at org.apache.cassandra.hadoop.trackers.TrackerInitializer$1.run(TrackerInitializer.java:93)
        at java.lang.Thread.run(Thread.java:662)
Caused by: TimedOutException()
        at org.apache.cassandra.thrift.CassandraServer.get_indexed_slices(CassandraServer.java:673)
        at org.apache.cassandra.hadoop.fs.CassandraFileSystemThriftStore.listDeepSubPaths(CassandraFileSystemThriftStore.java:506)
        ... 8 more
 WARN 23:25:31,300 Incorrect permissions on cassandra://localhost:9160/tmp/hadoop-riptano/mapred/system. Setting it to rwx------
ERROR 23:25:31,315 Fatal exception in thread Thread[ReadStage:6,5,main]
java.lang.AssertionError: No data found for NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17]) in DecoratedKey(55249227080490826413412398468829851220, 3165333533353736613164333836353061346636333465656437326131353939):QueryPath(columnFamilyName='inode', superColumnName='null', columnName='null') (original filter NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17])) from expression 'inode.73656e74696e656c EQ 78'
        at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1513)
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:46)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
 INFO 23:25:41,303 problem cleaning system directory: cassandra://localhost:9160/tmp/hadoop-riptano/mapred/system
java.io.IOException: TimedOutException()
        at org.apache.cassandra.hadoop.fs.CassandraFileSystemThriftStore.listDeepSubPaths(CassandraFileSystemThriftStore.java:523)
        at org.apache.cassandra.hadoop.fs.CassandraFileSystemThriftStore.listSubPaths(CassandraFileSystemThriftStore.java:529)
        at org.apache.cassandra.hadoop.fs.CassandraFileSystem.listStatus(CassandraFileSystem.java:171)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2374)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:2174)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:303)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:294)
        at org.apache.cassandra.hadoop.trackers.TrackerInitializer$1.run(TrackerInitializer.java:93)
        at java.lang.Thread.run(Thread.java:662)
Caused by: TimedOutException()
        at org.apache.cassandra.thrift.CassandraServer.get_indexed_slices(CassandraServer.java:673)
        at org.apache.cassandra.hadoop.fs.CassandraFileSystemThriftStore.listDeepSubPaths(CassandraFileSystemThriftStore.java:506)
        ... 8 more
 WARN 23:25:51,308 Incorrect permissions on cassandra://localhost:9160/tmp/hadoop-riptano/mapred/system. Setting it to rwx------
ERROR 23:25:51,321 Fatal exception in thread Thread[ReadStage:8,5,main]
java.lang.AssertionError: No data found for NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17]) in DecoratedKey(55249227080490826413412398468829851220, 3165333533353736613164333836353061346636333465656437326131353939):QueryPath(columnFamilyName='inode', superColumnName='null', columnName='null') (original filter NamesQueryFilter(columns=java.nio.HeapByteBuffer[pos=12 lim=16 cap=17])) from expression 'inode.73656e74696e656c EQ 78'
        at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1513)
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:46)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

{noformat}



> getColumnFamily() return null, which is not checked in ColumnFamilyStore.java scan() method, causing Timeout Exception in query
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2401
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2401
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>         Environment: Hector 0.7.0-28, Cassandra 0.7.4, Windows 7, Eclipse
>            Reporter: Tey Kar Shiang
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.6
>
>         Attachments: 2401-v2.txt, 2401-v3.txt, 2401.txt
>
>
> ColumnFamilyStore.java, line near 1680, "ColumnFamily data = getColumnFamily(new QueryFilter(dk, path, firstFilter))", the data is returned null, causing NULL exception in "satisfies(data, clause, primary)" which is not captured. The callback got timeout and return a Timeout exception to Hector.
> The data is empty, as I traced, I have the the columns Count as 0 in removeDeletedCF(), which return the null there. (I am new and trying to understand the logics around still). Instead of crash to NULL, could we bypass the data?
> About my test:
> A stress-test program to add, modify and delete data to keyspace. I have 30 threads simulate concurrent users to perform the actions above, and do a query to all rows periodically. I have Column Family with rows (as File) and columns as index (e.g. userID, fileType).
> No issue on the first day of test, and stopped for 3 days. I restart the test on 4th day, 1 of the users failed to query the files (timeout exception received). Most of the users are still okay with the query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira