You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Juho Mäkinen <ju...@gmail.com> on 2010/08/30 15:05:56 UTC

get_slice sometimes returns previous result on php

I've ran into a strange bug where get_slice returns the result from
previous query. My application iterates over a set of columns inside a
supercolumn and for some reason it sometimes (quite rarely but often
enough that it shows up) the results gets "shifted" around so that the
application gets the previous result. The application is using the
same cassandra thrift connection (it doesn't close it in between) and
everything is happening inside same php process.

Here's a cleaned up example from logs where this happens:

14:40 suomirock php-fi: [MISC] WARNING /blog.php: Cassandra stored
blog content for blog id 47528165 differs from database content.
14:40 suomirock php-fi: [MISC] WARNING /blog.php: from cassandra: AAAAAAAA
14:40 suomirock php-fi: [MISC] WARNING /blog.php: from database : BBBBBBBBB

14:40 suomirock php-fi: [MISC] WARNING /blog.php: Cassandra stored
blog content for blog id 47523032 differs from database content.
14:40 suomirock php-fi: [MISC] WARNING /blog.php: from cassandra: BBBBBBBBB
14:40 suomirock php-fi: [MISC] WARNING /blog.php: from database : CCCCCCCCCC

The data model is that I have a Super Column family which stores blog
entries. Each user has a single row. Inside this row there are CF's
where each CF contains a single blog entry. The key of the CF is the
blog id number and one of the columns inside the CF contains the blog
content.

The data which is in cassandra is correctly there and it's the same
what's inside our old storage tier (PostgreSQL) so I'm able to compare
the data returned from cassandra with the data returned from old
database.
Here's part of the output from cassandra-cli where I queried the row
for this user. As you can see, the "blog id" matches the super_column
inside cassandra.

=> (super_column=47540671, (column=content, value=AAAAAAAA,
timestamp=1282940401925456) )
=> (super_column=47528165, (column=content, value=BBBBBBBBB,
timestamp=1282940401925456) )
=> (super_column=47523032, (column=content, value=CCCCCCCCCC,
timestamp=1282940401925456) )

I'm in the middle of writing bunch of debugging code to get better
data what's really going on, but I'd be very happy if someone could
have any clue or helpful ideas how to debug this out.

 - Juho Mäkinen

Re: get_slice sometimes returns previous result on php

Posted by Juho Mäkinen <ju...@gmail.com>.
I've resolved this bug to be within my own php client wrapper and thus
not in Cassandra (great to know :).
The bug was that upon TException from get_slice the client code didn't
correctly close the socket but instead kept using it.
This resulted that the timeouted response was eventually delivered
into the socket and thus next get_slice operation got this delayed
data instead of the expected data.

I published my client libraries earlier in another message and I've
also uploaded them into github:
http://github.com/dynamoid/cassandra-utilities

Thanks Benjamin for your input on the subject :)

 - Juho Mäkinen

On Mon, Aug 30, 2010 at 11:15 PM, Juho Mäkinen <ju...@gmail.com> wrote:
> I'm not using connection poolin where the same tcp socket is used
> between different php requests. I open a new thrift connection with
> new socket to the node and I use the node through the request and I
> close it after. The get_slice requests are all happening in the same
> request, so something odd happens in the between.
>
> Tomorrow I'm going to implement a history buffer which logs all
> cassandra operations within the php request and logs it out in case I
> detect this anomaly again. Hopefully that gives some light to the
> problem.
>
>  - Juho Mäkinen
>
> On Mon, Aug 30, 2010 at 10:50 PM, Benjamin Black <b...@b3k.us> wrote:
>> On Mon, Aug 30, 2010 at 6:05 AM, Juho Mäkinen <ju...@gmail.com> wrote:
>>> The application is using the
>>> same cassandra thrift connection (it doesn't close it in between) and
>>> everything is happening inside same php process.
>>>
>>
>> This is why you are seeing this problem (and is specific to connection
>> reuse in certain languages, not a general problem with connection
>> reuse).
>>
>>
>> b
>>
>

Re: get_slice sometimes returns previous result on php

Posted by Juho Mäkinen <ju...@gmail.com>.
I'm not using connection poolin where the same tcp socket is used
between different php requests. I open a new thrift connection with
new socket to the node and I use the node through the request and I
close it after. The get_slice requests are all happening in the same
request, so something odd happens in the between.

Tomorrow I'm going to implement a history buffer which logs all
cassandra operations within the php request and logs it out in case I
detect this anomaly again. Hopefully that gives some light to the
problem.

 - Juho Mäkinen

On Mon, Aug 30, 2010 at 10:50 PM, Benjamin Black <b...@b3k.us> wrote:
> On Mon, Aug 30, 2010 at 6:05 AM, Juho Mäkinen <ju...@gmail.com> wrote:
>> The application is using the
>> same cassandra thrift connection (it doesn't close it in between) and
>> everything is happening inside same php process.
>>
>
> This is why you are seeing this problem (and is specific to connection
> reuse in certain languages, not a general problem with connection
> reuse).
>
>
> b
>

Re: get_slice sometimes returns previous result on php

Posted by Benjamin Black <b...@b3k.us>.
On Mon, Aug 30, 2010 at 6:05 AM, Juho Mäkinen <ju...@gmail.com> wrote:
> The application is using the
> same cassandra thrift connection (it doesn't close it in between) and
> everything is happening inside same php process.
>

This is why you are seeing this problem (and is specific to connection
reuse in certain languages, not a general problem with connection
reuse).


b