You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2013/11/20 00:57:21 UTC

[jira] [Resolved] (CASSANDRA-5501) Missing data on SELECT on secondary index

     [ https://issues.apache.org/jira/browse/CASSANDRA-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-5501.
---------------------------------------

    Resolution: Cannot Reproduce

> Missing data on SELECT on secondary index 
> ------------------------------------------
>
>                 Key: CASSANDRA-5501
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5501
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.4
>         Environment: linux ubuntu 12.04
>            Reporter: Marco Matarazzo
>         Attachments: query_log.txt, tracing.txt
>
>
> We have a 3 nodes cluster, and a keyspace with RF = 3.
> From cassandra-cli everything is fine (we actually never use it, I just launched it for a check in this particular case).
> [default@goh_master] get agents where station_id = ascii(1110129);
> -------------------
> RowKey: 6c8efeb6-7209-11e2-890a-aacc00000216
> => (column=, value=, timestamp=1364580868176000)
> => (column=character_points, value=, timestamp=1361030686890000)
> => (column=component_id, value=0, timestamp=1364580868176000)
> => (column=corporation_id, value=3efc729e-7209-11e2-890a-aacc00000216, timestamp=1361030686890000)
> => (column=entity_id, value=0, timestamp=1364580868176000)
> => (column=manufacturing, value=, timestamp=1361030686890000)
> => (column=model, value=500005, timestamp=1361030686890000)
> => (column=name, value=Jenny Olifield, timestamp=1361030686890000)
> => (column=name_check, value=jenny_olifield, timestamp=1361030686890000)
> => (column=station_id, value=1110129, timestamp=1364580868176000)
> => (column=stats_intellect, value=8, timestamp=1361030686890000)
> => (column=stats_reflexes, value=8, timestamp=1361030686890000)
> => (column=stats_stamina, value=7, timestamp=1361030686890000)
> => (column=stats_technology, value=7, timestamp=1361030686890000)
> => (column=trading, value=, timestamp=1361030686890000)
> -------------------
> RowKey: dc413373-6b06-11e2-8943-aacc00000216
> => (column=, value=, timestamp=1366568185220000)
> => (column=character_points, value=100, timestamp=1364580381651000)
> => (column=component_id, value=, timestamp=1364580381651000)
> => (column=corporation_id, value=574934cc-6b06-11e2-a512-aacc00000200, timestamp=1364580381651000)
> => (column=entity_id, value=0, timestamp=1364580381651000)
> => (column=manufacturing, value=, timestamp=1364580381651000)
> => (column=model, value=500018, timestamp=1364580381651000)
> => (column=name, value=Darren Matar, timestamp=1364580381651000)
> => (column=name_check, value=darren_matar, timestamp=1364580381651000)
> => (column=station_id, value=1110129, timestamp=1364580381651000)
> => (column=stats_intellect, value=10, timestamp=1364580381651000)
> => (column=stats_reflexes, value=10, timestamp=1364580381651000)
> => (column=stats_stamina, value=10, timestamp=1364580381651000)
> => (column=stats_technology, value=10, timestamp=1364580381651000)
> => (column=trading, value=1, timestamp=1366568185220000)
> -------------------
> RowKey: 0e7074ac-64bd-11e2-8c38-aacc00000201
> => (column=, value=, timestamp=1364828039093000)
> => (column=character_points, value=, timestamp=1361030686760000)
> => (column=component_id, value=0, timestamp=1364828039093000)
> => (column=corporation_id, value=e398294e-64bc-11e2-8c38-aacc00000201, timestamp=1361030686760000)
> => (column=entity_id, value=0, timestamp=1364828039093000)
> => (column=manufacturing, value=1, timestamp=1362517535613000)
> => (column=model, value=500008, timestamp=1361030686760000)
> => (column=name, value=Tom Bishop, timestamp=1361030686760000)
> => (column=name_check, value=tom_bishop, timestamp=1361030686760000)
> => (column=station_id, value=1110129, timestamp=1364828039093000)
> => (column=stats_intellect, value=9, timestamp=1361030686760000)
> => (column=stats_reflexes, value=7, timestamp=1361030686760000)
> => (column=stats_stamina, value=5, timestamp=1361030686760000)
> => (column=stats_technology, value=9, timestamp=1361030686760000)
> => (column=trading, value=, timestamp=1361030686760000)
> -------------------
> RowKey: 1b462f09-65f3-4148-a1a6-536b52b3bcfa
> => (column=, value=, timestamp=1366568185096000)
> => (column=character_points, value=100, timestamp=1364580381537000)
> => (column=component_id, value=, timestamp=1364580381537000)
> => (column=corporation_id, value=1d2a8803-d139-4b50-85eb-92cb1082de2e, timestamp=1364580381537000)
> => (column=entity_id, value=0, timestamp=1364580381537000)
> => (column=manufacturing, value=, timestamp=1364580381537000)
> => (column=model, value=500003, timestamp=1364580381537000)
> => (column=name, value=Andrea Len, timestamp=1364580381537000)
> => (column=name_check, value=andrea_len, timestamp=1364580381537000)
> => (column=station_id, value=1110129, timestamp=1364580381537000)
> => (column=stats_intellect, value=10, timestamp=1364580381537000)
> => (column=stats_reflexes, value=10, timestamp=1364580381537000)
> => (column=stats_stamina, value=10, timestamp=1364580381537000)
> => (column=stats_technology, value=10, timestamp=1364580381537000)
> => (column=trading, value=1, timestamp=1366568185096000)
> 4 Rows Returned.
> From CQLSH, hovewer, the result is different, and 2 rows are missing.
> cqlsh:goh_master> select agent_id,name,station_id from agents where station_id='1110129';
>  agent_id                             | name           | station_id
> --------------------------------------+----------------+------------
>  6c8efeb6-7209-11e2-890a-aacc00000216 | Jenny Olifield |    1110129
>  0e7074ac-64bd-11e2-8c38-aacc00000201 |     Tom Bishop |    1110129
> cqlsh:goh_master> select agent_id, name, station_id from agents where agent_id = '1b462f09-65f3-4148-a1a6-536b52b3bcfa';
>  agent_id                             | name       | station_id
> --------------------------------------+------------+------------
>  1b462f09-65f3-4148-a1a6-536b52b3bcfa | Andrea Len |    1110129
> Updating one column makes the single row reappear in the index, but just for that row and that columns/index.
> cqlsh:goh_master> update agents set station_id = '1110129' where agent_id = '1b462f09-65f3-4148-a1a6-536b52b3bcfa';
> cqlsh:goh_master> select agent_id,name,station_id from agents where station_id='1110129';
>  agent_id                             | name           | station_id
> --------------------------------------+----------------+------------
>  6c8efeb6-7209-11e2-890a-aacc00000216 | Jenny Olifield |    1110129
>  0e7074ac-64bd-11e2-8c38-aacc00000201 |     Tom Bishop |    1110129
>  1b462f09-65f3-4148-a1a6-536b52b3bcfa |     Andrea Len |    1110129
> Updating one columns does not make all the row re-appear on all indexes (as it would be somewhat expected), but just on the updated one.
> cqlsh:goh_master> select * from agents where name = 'Andrea Len';
> cqlsh:goh_master> 
> Running nodetool rebuild_index on all three nodes apparently DOES NOT fixes the problem, neither do nodetool repair.
> We also used COPY TO to dump the entire row to check for hidden spaces or anything like that, but we can't see anything:
> ....
> dc413373-6b06-11e2-8943-aacc00000216,100,,574934cc-6b06-11e2-a512-aacc00000200,0,,500018,Darren Matar,darren_matar,1110129,10,10,10,10,1
> 1b462f09-65f3-4148-a1a6-536b52b3bcfa,100,,1d2a8803-d139-4b50-85eb-92cb1082de2e,0,,500003,Andrea Len,andrea_len,1110129,10,10,10,10,1
> ....
> Logs shows no error. I lowered log level to DEBUG and the only thing that I see when I launch the query is just:
> DEBUG [Thrift:16] 2013-04-22 10:47:27,727 CassandraServer.java (line 1703) execute_cql3_query
> DEBUG [Thrift:16] 2013-04-22 10:47:27,737 Tracing.java (line 167) request complete
> In attachment, a log of the query with TRACE level log (in the meantime other rows were added, but the result doesn't change). 
> I don't know how the internal mechanism of Cassandra works, so this could absolutely be no related, but the two columns that are not coming out have a more recent timestamp as it can be seen from cassandra-cli, and this CF has two sstables on disk, ...-435 and ...-436, and with log level lowered to trace, it seems to me that -436 get skipped due to bloom filter. I am really guessing here, I admit.
> The very same skipping is shown in logs when i try to rebuild indexes, btw.



--
This message was sent by Atlassian JIRA
(v6.1#6144)