You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thomas Richter <th...@tricnet.de> on 2011/10/13 16:38:11 UTC
Re: [Solved] column index offset miscalculation

Thanks for the hint.

Ticket created: https://issues.apache.org/jira/browse/CASSANDRA-3358

Best,

Thomas

On 10/13/2011 03:27 PM, Sylvain Lebresne wrote:
> JIRA is not read-only, you should be able to create a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA, though
> that probably require that you create an account.
> 
> --
> Sylvain
> 
> On Thu, Oct 13, 2011 at 3:20 PM, Thomas Richter <th...@tricnet.de> wrote:
>> Hi Aaron,
>>
>> the fix does the trick. I wonder why nobody else ran into this before...
>> I checked org/apache/cassandra/db/ColumnIndexer.java in 0.7.9, 0.8.7 and
>> 1.0.0-rc2 and all seem to be affected.
>>
>> Looks like public Jira is readonly - so I'm not sure about how to continue.
>>
>> Best,
>>
>> Thomas
>>
>> On 10/13/2011 10:52 AM, Thomas Richter wrote:
>>> Hi Aaron,
>>>
>>> I guess i found it :-).
>>>
>>> I added logging for the used IndexInfo to
>>> SSTableNamesIterator.readIndexedColumns and got negative index postions
>>> for the missing columns. This is the reason why the columns are not
>>> loaded from sstable.
>>>
>>> So I had a look at ColumnIndexer.serializeInternal and there it is:
>>>
>>> int endPosition = 0, startPosition = -1;
>>>
>>> Should be:
>>>
>>> long endPosition = 0, startPosition = -1;
>>>
>>> I'm currently running a compaction with a fixed version to verify.
>>>
>>> Best,
>>>
>>> Thomas
>>>
>>> On 10/12/2011 11:54 PM, aaron morton wrote:
>>>> Sounds a lot like the column is deleted.
>>>>
>>>> IIRC this is where the columns from various SSTables are reduced
>>>> https://github.com/apache/cassandra/blob/cassandra-0.8/src/java/org/apache/cassandra/db/filter/QueryFilter.java#L117
>>>>
>>>> The call to ColumnFamily.addColumn() is where the column instance may be merged with other instances.
>>>>
>>>> A
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 13/10/2011, at 5:33 AM, Thomas Richter wrote:
>>>>
>>>>> Hi Aaron,
>>>>>
>>>>> I cannot read the column with a slice query.
>>>>> The slice query only returns data till a certain column and after that i
>>>>> only get empty results.
>>>>>
>>>>> I added log output to QueryFilter.isRelevant to see if the filter is
>>>>> dropping the column(s) but it doesn't even show up there.
>>>>>
>>>>> Next thing i will check check is the diff between columns contained in
>>>>> json export and columns fetched with the slice query, maybe this gives
>>>>> more clue...
>>>>>
>>>>> Any other ideas where to place more debugging output to see what's
>>>>> happening?
>>>>>
>>>>> Best,
>>>>>
>>>>> Thomas
>>>>>
>>>>> On 10/11/2011 12:46 PM, aaron morton wrote:
>>>>>> kewl,
>>>>>>
>>>>>>> * Row is not deleted (other columns can be read, row survives compaction
>>>>>>> with GCGraceSeconds=0)
>>>>>>
>>>>>> IIRC row tombstones can hang around for a while (until gc grace has passed), and they only have an effect on columns that have a lower timstamp. So it's possible to read columns from a row with a tombstone.
>>>>>>
>>>>>> Can you read the column using a slice range rather than specifying it's name ?
>>>>>>
>>>>>> Aaron
>>>>>>
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Cassandra Developer
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>>
>>>>>> On 11/10/2011, at 11:15 PM, Thomas Richter wrote:
>>>>>>
>>>>>>> Hi Aaron,
>>>>>>>
>>>>>>> i invalidated the caches but nothing changed. I didn't get the mentioned
>>>>>>> log line either, but as I read the code SliceByNamesReadCommand uses
>>>>>>> NamesQueryFilter and not SliceQueryFilter.
>>>>>>>
>>>>>>> Next, there is only one SSTable.
>>>>>>>
>>>>>>> I can rule out that the row is deleted because I deleted all other rows
>>>>>>> in that CF to reduce data size and speed up testing. I set
>>>>>>> GCGraceSeconds to zero and ran a compaction. All other rows are gone,
>>>>>>> but i can still access at least one column from the left row.
>>>>>>> So as far as I understand it, there should not be a tombstone on row level.
>>>>>>>
>>>>>>> To make it a list:
>>>>>>>
>>>>>>> * One SSTable, one row
>>>>>>> *
>>>>>>> * Row is not deleted (other columns can be read, row survives compaction
>>>>>>> with GCGraceSeconds=0)
>>>>>>> * Most columns can be read by get['row']['col'] from cassandra-cli
>>>>>>> * Some columns can not be read by get['row']['col'] from cassandra-cli
>>>>>>> but can be found in output of sstable2json
>>>>>>> * unreadable data survives compaction with GCGraceSeconds=0 (checked
>>>>>>> with sstable2json)
>>>>>>> * Invalidation caches does not help
>>>>>>> * Nothing in the logs
>>>>>>>
>>>>>>> Does that point into any direction where i should look next?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>> On 10/11/2011 10:30 AM, aaron morton wrote:
>>>>>>>> Nothing jumps out. The obvious answer is that the column has been deleted. Did you check all the SSTables ?
>>>>>>>>
>>>>>>>> It looks like query returned from row cache, otherwise you would see this as well…
>>>>>>>>
>>>>>>>> DEBUG [ReadStage:34] 2011-10-11 21:11:11,484 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 1318294191654059:false:354@1318294191654861
>>>>>>>>
>>>>>>>> Which would mean a version of the column was found.
>>>>>>>>
>>>>>>>> If you invalidate the cache with nodetool and run the query and the log message appears it will mean the column was read from (all of the) sstables. If you do not get a column returned I would say there is a tombstone in place. It's either a row level or a column level one.
>>>>>>>>
>>>>>>>> Hope that helps.
>>>>>>>>
>>>>>>>> -----------------
>>>>>>>> Aaron Morton
>>>>>>>> Freelance Cassandra Developer
>>>>>>>> @aaronmorton
>>>>>>>> http://www.thelastpickle.com
>>>>>>>>
>>>>>>>> On 11/10/2011, at 10:35 AM, Thomas Richter wrote:
>>>>>>>>
>>>>>>>>> Hi Aaron,
>>>>>>>>>
>>>>>>>>> normally we use hector to access cassandra, but for debugging I switched
>>>>>>>>> to cassandra-cli.
>>>>>>>>>
>>>>>>>>> Column can not be read by a simple
>>>>>>>>> get CFName['rowkey']['colname'];
>>>>>>>>>
>>>>>>>>> Response is "Value was not found"
>>>>>>>>> if i query another column, everything is just fine.
>>>>>>>>>
>>>>>>>>> Serverlog for unsuccessful read (keyspace and CF names replaced):
>>>>>>>>>
>>>>>>>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,739 CassandraServer.java
>>>>>>>>> (line 280) get
>>>>>>>>>
>>>>>>>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,744 StorageProxy.java (line
>>>>>>>>> 320) Command/ConsistencyLevel is
>>>>>>>>> SliceByNamesReadCommand(table='Keyspace',
>>>>>>>>> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
>>>>>>>>> columnParent='QueryPath(columnFamilyName='ColumnFamily',
>>>>>>>>> superColumnName='null', columnName='null')',
>>>>>>>>> columns=[574c303030375030,])/ONE
>>>>>>>>>
>>>>>>>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 ReadCallback.java (line
>>>>>>>>> 86) Blockfor/repair is 1/true; setting up requests to localhost/127.0.0.1
>>>>>>>>>
>>>>>>>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,750 StorageProxy.java (line
>>>>>>>>> 343) reading data locally
>>>>>>>>>
>>>>>>>>> DEBUG [ReadStage:33] 2011-10-10 23:15:29,751 StorageProxy.java (line
>>>>>>>>> 448) LocalReadRunnable reading SliceByNamesReadCommand(table='Keyspace',
>>>>>>>>> key=61636162626139322d396638312d343562382d396637352d393162303337383030393762,
>>>>>>>>> columnParent='QueryPath(columnFamilyName='ColumnFamily',
>>>>>>>>> superColumnName='null', columnName='null')', columns=[574c303030375030,])
>>>>>>>>>
>>>>>>>>> DEBUG [pool-1-thread-1] 2011-10-10 23:15:29,818 StorageProxy.java (line
>>>>>>>>> 393) Read: 67 ms.
>>>>>>>>>
>>>>>>>>> Log looks fine to me, but no result is returned.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>> On 10/10/2011 10:00 PM, aaron morton wrote:
>>>>>>>>>> How are they unreadable ? You need to go into some details about what is going wrong.
>>>>>>>>>>
>>>>>>>>>> What sort of read ?
>>>>>>>>>> What client ?
>>>>>>>>>> What is in the logging on client and server side ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Try turning the logging up to DEBUG on the server to watch what happens.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> -----------------
>>>>>>>>>> Aaron Morton
>>>>>>>>>> Freelance Cassandra Developer
>>>>>>>>>> @aaronmorton
>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>
>>>>>>>>>> On 10/10/2011, at 9:23 PM, Thomas Richter wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> no errors in the server logs. The columns are unreadable on all nodes at
>>>>>>>>>>> any consistency level (ONE, QUORUM, ALL). We started with 0.7.3 and
>>>>>>>>>>> upgraded to 0.7.6-2 two months ago.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Thomas
>>>>>>>>>>>
>>>>>>>>>>> On 10/10/2011 10:03 AM, aaron morton wrote:
>>>>>>>>>>>> What error are you seeing  in the server logs ? Are the columns unreadable at all Consistency Levels ? i.e. are the columns unreadable on all nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> What is the upgrade history of the cluster ? What version did it start at ?
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----------------
>>>>>>>>>>>> Aaron Morton
>>>>>>>>>>>> Freelance Cassandra Developer
>>>>>>>>>>>> @aaronmorton
>>>>>>>>>>>> http://www.thelastpickle.com
>>>>>>>>>>>>
>>>>>>>>>>>> On 10/10/2011, at 7:42 AM, Thomas Richter wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> here is some further information. Compaction did not help, but data is
>>>>>>>>>>>>> still there when I dump the row with sstable2json.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 10/08/2011 11:30 PM, Thomas Richter wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we are running a 3 node cassandra (0.7.6-2) cluster and some of our
>>>>>>>>>>>>>> column families contain quite large rows (400k+ columns, 4-6GB row size).
>>>>>>>>>>>>>> Replicaton factor is 3 for all keyspaces. The cluster is running fine
>>>>>>>>>>>>>> for several months now and we never experienced any serious trouble.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Some days ago we noticed, that some previously written columns could not
>>>>>>>>>>>>>> be read. This does not always happen, and only some dozen columns out of
>>>>>>>>>>>>>> 400k are affected.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> After ruling out application logic as a cause I dumped the row in
>>>>>>>>>>>>>> question with sstable2json and the columns are there (and are not marked
>>>>>>>>>>>>>> for deletion).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Next thing was setting up a fresh single node cluster and copying the
>>>>>>>>>>>>>> column family data to that node. Columns could not be read either.
>>>>>>>>>>>>>> Right now I'm running a nodetool compact for the cf to see if data could
>>>>>>>>>>>>>> be read afterwards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any explanation for such behavior? Are there any suggestions
>>>>>>>>>>>>>> for further investigation?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> TIA,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thomas
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>