You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kyle Gibson <ky...@frozenonline.com> on 2011/08/18 16:49:18 UTC

Occasionally getting old data back with ConsistencyLevel.ALL

I am running cassandra 0.7.8. pycassa 1.1.0

Nodes=7, RF=3

This problem started a few months ago and only occurs sporadically.

I receive notifications from paypal's IPN. The IPN data is saved into
a column family. I add another column for "processed" which is set to
0.

Every 5 minutes, a cron script runs and pulls down IPN events that
haven't been processed. It does some work, and then writes back
processed to 1.

Usually this worked, but then (when exactly I don't recall),
occasionally, it started having a problem. Processed wasn't been set
to 1. So, IPN events would be processed twice.

I tried a few things to fix this. Repair, compact, restarting the
cluster, upgrading. I even did a complete rebuild of the cluster,
wiping the data directory and starting fresh on 0.7.8.

I then ditched the "processed" column and decide to use two column
families. IPNs are put into column family A, and after being
processed, it is inserted into column family B and deleted from A.

The problem still persisted. At this point I was using CL.QUORUM. So,
I started using CL.ALL.

And the problem still persists. Having IPN events processed twice
causes a fair bit of problems, so this is something I really need to
get resolved.

Thanks,

Kyle

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Kyle Gibson <ky...@frozenonline.com>.
Thanks for the reply Peter, you may have discovered the problem, I'll
explain below.

On Sun, Aug 28, 2011 at 8:51 AM, Peter Schuller
<pe...@infidyne.com> wrote:
>> Understood. In the code example I provided, I am writing the same
>> value, but I am doing so in quick succession, so perhaps a few second
>> sleep might be helpful. It is worth noting also that the code I
>> provided is only the second step 2 in the process. There is a php
>> script that receives the post request from Paypal which inserts the
>> IPN data into the IPN column family. Before it does this, it sets the
>> "processed" column to "no"
>
> Is it at all possible that this step happens twice? I have no idea
> what Paypal does or document, but in general with an HTTP based
> callback you (in Paypal's position) would either have to accept that
> human intervention is necessary on any transaction where the callback
> fails, or else implement some kind of re-try and keep submitting to
> the customer until the callback is successful. Keep in mind that the
> other end (meaning you in this case) can perceive to receive a
> successful HTTP request and send back a response, even though Paypal
> may perceive an error on their end.

> If you haven't, I'd definitely recommend checking logs at this step,
> or adding logging if required, to make sure that the callback is not
> happening twice.

You are correct here. If PayPal fails to get a positive response from
my callback, it will retry the IPN event until it gets a successful
response. When this happens, a new column appears "retry_count" which
is set to the positive integer representing the number of tries
attempted. Given that this column has always shown as 0 and the IPN
event log on paypal.com also shows no retries attempted, I believe I
am correct in assuming that this isn't the case.

> How much traffic do you have to this cluster? Is it feasable to run
> Cassandra with full debug enabled (spammings lots of text in your
> logs)? That might be one way to ascertain, once you have one of these
> cases happening, whether Cassandra is mentioning any activity
> pertaining to the row that might explain this, such as it being
> re-written by a client.

Not really a lot of traffic, and if my fix below doesn't work I will
definitely give this a shot.

> Another suggestion: Is it possible you do not have clocks synchronized
> among your clients? Suppose that that Paypal *is* submitting twice
> sometimes, and e.g. one of your PHP front-ends (or whoever is talking
> to Cassandra to insert the data) has clock drift. This would render
> the insert from your code snippet obsolete, if there is already a
> value inserted with a timestamp in the future.

This appears to be the case. The server that the PHP front-end resides
on was 80 seconds into the future. The server that handles IPN
processing was sync'd with NTP to ntp.ubuntu.com. So if a processing
event occurred within 80 seconds of inserting the IPN event, and the
'processed' column was updated, the time stamp for that column would
be earlier than the original insert and thus obsolete. That's why it
always worked on the second attempt, because by then, enough time had
passed to make the original insert obsolete despite the drift.

For some reason the PHP front-end server was lacking any time
synchronization. I have corrected this, it now syncs to ntp.ubuntu.com
just like all the others. I will post back on this topic if it appears
to have solved the problem.

>
> --
> / Peter Schuller (@scode on twitter)
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Peter Schuller <pe...@infidyne.com>.
> Understood. In the code example I provided, I am writing the same
> value, but I am doing so in quick succession, so perhaps a few second
> sleep might be helpful. It is worth noting also that the code I
> provided is only the second step 2 in the process. There is a php
> script that receives the post request from Paypal which inserts the
> IPN data into the IPN column family. Before it does this, it sets the
> "processed" column to "no"

Is it at all possible that this step happens twice? I have no idea
what Paypal does or document, but in general with an HTTP based
callback you (in Paypal's position) would either have to accept that
human intervention is necessary on any transaction where the callback
fails, or else implement some kind of re-try and keep submitting to
the customer until the callback is successful. Keep in mind that the
other end (meaning you in this case) can perceive to receive a
successful HTTP request and send back a response, even though Paypal
may perceive an error on their end.

If you haven't, I'd definitely recommend checking logs at this step,
or adding logging if required, to make sure that the callback is not
happening twice.

However, your code snippet looks good to me and the fact that you're
triggering the log entry suggest to me that the problem isn't
duplicate submission, since the time window is presumably very small
in between your put and your get (but see further below about clocks).

How much traffic do you have to this cluster? Is it feasable to run
Cassandra with full debug enabled (spammings lots of text in your
logs)? That might be one way to ascertain, once you have one of these
cases happening, whether Cassandra is mentioning any activity
pertaining to the row that might explain this, such as it being
re-written by a client.

Another suggestion: Is it possible you do not have clocks synchronized
among your clients? Suppose that that Paypal *is* submitting twice
sometimes, and e.g. one of your PHP front-ends (or whoever is talking
to Cassandra to insert the data) has clock drift. This would render
the insert from your code snippet obsolete, if there is already a
value inserted with a timestamp in the future.

-- 
/ Peter Schuller (@scode on twitter)

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Kyle Gibson <ky...@frozenonline.com>.
Update:

I scaled my cluster down from 7 nodes to 3 nodes, and kept RF=3. I did
a complete cluster rebuild, so everything was fresh. Kept my reads and
writes at CL.ALL. For a while there it seemed like I had succeeded in
eliminating the problem. Unfortunately about an hour ago a duplicate
came through, and the same IPN was processed twice.

Does anyone have any more suggestions as to what is going on here?

On Mon, Aug 22, 2011 at 1:59 PM, Kyle Gibson
<ky...@frozenonline.com> wrote:
> Thanks for the reply.
>
> On Mon, Aug 22, 2011 at 1:11 PM, Dominic Williams
> <dw...@fightmymonster.com> wrote:
>> Hi there, here's my tuppence!
>> 1. Something to look at first:
>>
>> If you write two different values to the same column quickly in succession,
>> if both writes go out with the same timestamp, then it is indeterminate
>> which one wins i.e. write order doesn't necessarily matter.
>
> Understood. In the code example I provided, I am writing the same
> value, but I am doing so in quick succession, so perhaps a few second
> sleep might be helpful. It is worth noting also that the code I
> provided is only the second step 2 in the process. There is a php
> script that receives the post request from Paypal which inserts the
> IPN data into the IPN column family. Before it does this, it sets the
> "processed" column to "no"
>
>> 2. Don't trust PayPal (anyone using PayPal should really read this)
>> We are / were relying on IPNs to manage our website's recurring
>> subscriptions list. We experienced this weird thing where the
>> recurring_payment_profile_created IPN was missing, and got thought maybe
>> Cassandra was losing it because PayPal is a financial system and it couldn't
>> possibly fail to generate an IPN, right!!?
>> Anyway, it turns out that after exhaustive discussions with PayPal
>> engineers, and having proved this from the PayPal logs, that sometimes IPNs
>> fail to get generated. Yup. Read that again!!!! Sometimes the fail to get
>> generated and in fact this is happening to us quite regularly now.
>> They justify this (while acknowledging this issue should be in their
>> documentation) by saying that because HTTP delivery is unreliable (hmmm
>> isn't this what the retry queue is for..) we shouldn't be relying entirely
>> on IPNs and should regularly download the logs and run them through scripts
>> to catch problems (this is idiotic, since the angry customer will get on our
>> case immediately when they pay and membership doesn't start)
>> Not sure whether PayPal or database failing is best option. Look forward to
>> hearing resolution.
>
> I have experienced a failing to receive an IPN event before. In this
> case the IPN even is never saved to the IPN column family, and the
> cron script doesn't process it once, or twice, for that matter. Odd
> thing about the failed IPN event is that it didn't even show up in the
> IPN history, so i couldn't "replay" the event.
>
> I am fairly positive that the problem is either with my environment or
> cassandra and not paypal in this case. I am hoping it is my
> environment because i suspect that will be easier to fix.
>
> Oddly enough, the second time the IPN is processed, the column write
> succeeds. This always happens 5 minutes after the first one is
> processed.
>
> I neglected to mention an important part of the process: after the IPN
> event is processed (e.g. a new payment), an email is sent out to
> myself and the sender. This is how I know for sure the event is being
> processed twice, because not only do I receive two emails (spaced 5
> minutes apart) but does the individual who paid. This is often
> embarrassing to explain and somewhat difficult, customers get confused
> as to which account they are supposed to use, etc.
>
> Thanks
>
>> Best,
>> Dominic
>> On 22 August 2011 17:49, Kyle Gibson <ky...@frozenonline.com> wrote:
>>>
>>> I made some changes to my code base that uses cassandra. I went back
>>> to using the "processed" column, but instead of using "0" or "1" I
>>> decided to use "no" and "yes"
>>>
>>> You can view the code here: http://pastebin.com/gRBC16e7
>>>
>>> As you can see from the code, I perform an insert, get, check the
>>> result, if it didn't work, I try to insert again, and check the get.
>>> Each time I do a print out to see what the result is. Each operation
>>> is a CL.ALL.
>>>
>>> A few successful IPNs did come through before this one was generated:
>>>
>>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce
>>> OrderedDict([..., (u'processed', u'no'), ...])
>>> Failed to set processed to yes
>>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert 1314012603578714
>>> Failed to set processed to yes
>>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert2 1314012603586201
>>>
>>> As expected, this IPN was processed twice.
>>>
>>> On Sat, Aug 20, 2011 at 5:37 PM, Peter Schuller
>>> <pe...@infidyne.com> wrote:
>>> >> Do you mean the cassandra log, or just logging in the script itself?
>>> >
>>> > The script itself. I.e, some "independent" verification that the line
>>> > of code after the insert is in fact running, just in case there's some
>>> > kind of silent failure.
>>> >
>>> > Sounds like you've tried to address it though with the E-Mail:s.
>>> >
>>> > I suppose it boils down to: Either there is something wrong in your
>>> > environment/code, or Cassandra does have a bug. If the latter, it
>>> > would probably be helpful if you could try to reproduce it in your
>>> > environment in a way which can be shared - such as a script that does
>>> > writes and reads back to confirm the write made it. Or maybe just
>>> > adding more explicit logging to your script (even if it causes some
>>> > log flooding) to "prove" that a write truly happened.
>>> >
>>> > --
>>> > / Peter Schuller (@scode on twitter)
>>> >
>>
>>
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Kyle Gibson <ky...@frozenonline.com>.
Thanks for the reply.

On Mon, Aug 22, 2011 at 1:11 PM, Dominic Williams
<dw...@fightmymonster.com> wrote:
> Hi there, here's my tuppence!
> 1. Something to look at first:
>
> If you write two different values to the same column quickly in succession,
> if both writes go out with the same timestamp, then it is indeterminate
> which one wins i.e. write order doesn't necessarily matter.

Understood. In the code example I provided, I am writing the same
value, but I am doing so in quick succession, so perhaps a few second
sleep might be helpful. It is worth noting also that the code I
provided is only the second step 2 in the process. There is a php
script that receives the post request from Paypal which inserts the
IPN data into the IPN column family. Before it does this, it sets the
"processed" column to "no"

> 2. Don't trust PayPal (anyone using PayPal should really read this)
> We are / were relying on IPNs to manage our website's recurring
> subscriptions list. We experienced this weird thing where the
> recurring_payment_profile_created IPN was missing, and got thought maybe
> Cassandra was losing it because PayPal is a financial system and it couldn't
> possibly fail to generate an IPN, right!!?
> Anyway, it turns out that after exhaustive discussions with PayPal
> engineers, and having proved this from the PayPal logs, that sometimes IPNs
> fail to get generated. Yup. Read that again!!!! Sometimes the fail to get
> generated and in fact this is happening to us quite regularly now.
> They justify this (while acknowledging this issue should be in their
> documentation) by saying that because HTTP delivery is unreliable (hmmm
> isn't this what the retry queue is for..) we shouldn't be relying entirely
> on IPNs and should regularly download the logs and run them through scripts
> to catch problems (this is idiotic, since the angry customer will get on our
> case immediately when they pay and membership doesn't start)
> Not sure whether PayPal or database failing is best option. Look forward to
> hearing resolution.

I have experienced a failing to receive an IPN event before. In this
case the IPN even is never saved to the IPN column family, and the
cron script doesn't process it once, or twice, for that matter. Odd
thing about the failed IPN event is that it didn't even show up in the
IPN history, so i couldn't "replay" the event.

I am fairly positive that the problem is either with my environment or
cassandra and not paypal in this case. I am hoping it is my
environment because i suspect that will be easier to fix.

Oddly enough, the second time the IPN is processed, the column write
succeeds. This always happens 5 minutes after the first one is
processed.

I neglected to mention an important part of the process: after the IPN
event is processed (e.g. a new payment), an email is sent out to
myself and the sender. This is how I know for sure the event is being
processed twice, because not only do I receive two emails (spaced 5
minutes apart) but does the individual who paid. This is often
embarrassing to explain and somewhat difficult, customers get confused
as to which account they are supposed to use, etc.

Thanks

> Best,
> Dominic
> On 22 August 2011 17:49, Kyle Gibson <ky...@frozenonline.com> wrote:
>>
>> I made some changes to my code base that uses cassandra. I went back
>> to using the "processed" column, but instead of using "0" or "1" I
>> decided to use "no" and "yes"
>>
>> You can view the code here: http://pastebin.com/gRBC16e7
>>
>> As you can see from the code, I perform an insert, get, check the
>> result, if it didn't work, I try to insert again, and check the get.
>> Each time I do a print out to see what the result is. Each operation
>> is a CL.ALL.
>>
>> A few successful IPNs did come through before this one was generated:
>>
>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce
>> OrderedDict([..., (u'processed', u'no'), ...])
>> Failed to set processed to yes
>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert 1314012603578714
>> Failed to set processed to yes
>> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert2 1314012603586201
>>
>> As expected, this IPN was processed twice.
>>
>> On Sat, Aug 20, 2011 at 5:37 PM, Peter Schuller
>> <pe...@infidyne.com> wrote:
>> >> Do you mean the cassandra log, or just logging in the script itself?
>> >
>> > The script itself. I.e, some "independent" verification that the line
>> > of code after the insert is in fact running, just in case there's some
>> > kind of silent failure.
>> >
>> > Sounds like you've tried to address it though with the E-Mail:s.
>> >
>> > I suppose it boils down to: Either there is something wrong in your
>> > environment/code, or Cassandra does have a bug. If the latter, it
>> > would probably be helpful if you could try to reproduce it in your
>> > environment in a way which can be shared - such as a script that does
>> > writes and reads back to confirm the write made it. Or maybe just
>> > adding more explicit logging to your script (even if it causes some
>> > log flooding) to "prove" that a write truly happened.
>> >
>> > --
>> > / Peter Schuller (@scode on twitter)
>> >
>
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Dominic Williams <dw...@fightmymonster.com>.
Hi there, here's my tuppence!

1. Something to look at first:

If you write two different values to the same column quickly in succession,
if both writes go out with the same timestamp, then it is indeterminate
which one wins i.e. write order doesn't necessarily matter.

2. Don't trust PayPal (anyone using PayPal should really read this)

We are / were relying on IPNs to manage our website's recurring
subscriptions list. We experienced this weird thing where the
recurring_payment_profile_created IPN was missing, and got thought maybe
Cassandra was losing it because PayPal is a financial system and it couldn't
possibly fail to generate an IPN, right!!?

Anyway, it turns out that after exhaustive discussions with PayPal
engineers, and having proved this from the PayPal logs, that sometimes IPNs
fail to get generated. Yup. Read that again!!!! Sometimes the fail to get
generated and in fact this is happening to us quite regularly now.

They justify this (while acknowledging this issue should be in their
documentation) by saying that because HTTP delivery is unreliable (hmmm
isn't this what the retry queue is for..) we shouldn't be relying entirely
on IPNs and should regularly download the logs and run them through scripts
to catch problems (this is idiotic, since the angry customer will get on our
case immediately when they pay and membership doesn't start)

Not sure whether PayPal or database failing is best option. Look forward to
hearing resolution.

Best,
Dominic

On 22 August 2011 17:49, Kyle Gibson <ky...@frozenonline.com> wrote:

> I made some changes to my code base that uses cassandra. I went back
> to using the "processed" column, but instead of using "0" or "1" I
> decided to use "no" and "yes"
>
> You can view the code here: http://pastebin.com/gRBC16e7
>
> As you can see from the code, I perform an insert, get, check the
> result, if it didn't work, I try to insert again, and check the get.
> Each time I do a print out to see what the result is. Each operation
> is a CL.ALL.
>
> A few successful IPNs did come through before this one was generated:
>
> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce
> OrderedDict([..., (u'processed', u'no'), ...])
> Failed to set processed to yes
> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert 1314012603578714
> Failed to set processed to yes
> IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert2 1314012603586201
>
> As expected, this IPN was processed twice.
>
> On Sat, Aug 20, 2011 at 5:37 PM, Peter Schuller
> <pe...@infidyne.com> wrote:
> >> Do you mean the cassandra log, or just logging in the script itself?
> >
> > The script itself. I.e, some "independent" verification that the line
> > of code after the insert is in fact running, just in case there's some
> > kind of silent failure.
> >
> > Sounds like you've tried to address it though with the E-Mail:s.
> >
> > I suppose it boils down to: Either there is something wrong in your
> > environment/code, or Cassandra does have a bug. If the latter, it
> > would probably be helpful if you could try to reproduce it in your
> > environment in a way which can be shared - such as a script that does
> > writes and reads back to confirm the write made it. Or maybe just
> > adding more explicit logging to your script (even if it causes some
> > log flooding) to "prove" that a write truly happened.
> >
> > --
> > / Peter Schuller (@scode on twitter)
> >
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Kyle Gibson <ky...@frozenonline.com>.
I made some changes to my code base that uses cassandra. I went back
to using the "processed" column, but instead of using "0" or "1" I
decided to use "no" and "yes"

You can view the code here: http://pastebin.com/gRBC16e7

As you can see from the code, I perform an insert, get, check the
result, if it didn't work, I try to insert again, and check the get.
Each time I do a print out to see what the result is. Each operation
is a CL.ALL.

A few successful IPNs did come through before this one was generated:

IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce
OrderedDict([..., (u'processed', u'no'), ...])
Failed to set processed to yes
IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert 1314012603578714
Failed to set processed to yes
IPN-5943a4adc8eab68cdbc9d9eff7fa7dc669fa0bce insert2 1314012603586201

As expected, this IPN was processed twice.

On Sat, Aug 20, 2011 at 5:37 PM, Peter Schuller
<pe...@infidyne.com> wrote:
>> Do you mean the cassandra log, or just logging in the script itself?
>
> The script itself. I.e, some "independent" verification that the line
> of code after the insert is in fact running, just in case there's some
> kind of silent failure.
>
> Sounds like you've tried to address it though with the E-Mail:s.
>
> I suppose it boils down to: Either there is something wrong in your
> environment/code, or Cassandra does have a bug. If the latter, it
> would probably be helpful if you could try to reproduce it in your
> environment in a way which can be shared - such as a script that does
> writes and reads back to confirm the write made it. Or maybe just
> adding more explicit logging to your script (even if it causes some
> log flooding) to "prove" that a write truly happened.
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Peter Schuller <pe...@infidyne.com>.
> Do you mean the cassandra log, or just logging in the script itself?

The script itself. I.e, some "independent" verification that the line
of code after the insert is in fact running, just in case there's some
kind of silent failure.

Sounds like you've tried to address it though with the E-Mail:s.

I suppose it boils down to: Either there is something wrong in your
environment/code, or Cassandra does have a bug. If the latter, it
would probably be helpful if you could try to reproduce it in your
environment in a way which can be shared - such as a script that does
writes and reads back to confirm the write made it. Or maybe just
adding more explicit logging to your script (even if it causes some
log flooding) to "prove" that a write truly happened.

-- 
/ Peter Schuller (@scode on twitter)

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Kyle Gibson <ky...@frozenonline.com>.
The cron script doesn't do much. It pulls new IPNs (usually only 1 in
a given 5 minute period), inserts a row, and then sends an email.

As for failure handling in the script itself, I rely on python
exception handling, and whenever an exception occurs I do get an email
with the exception details. No such exception occurs when this
duplication occurs.

As for being SURE that the processed flag is set, no, I suppose I am
not 100% absolutely positive. I did try running the following: insert,
sleep for a second, get, check to see if it was set, if not, send me
an email and try setting it again. Sometimes this would work, and then
sometimes even the second insert would fail. Again, with no
exceptions.

Do you mean the cassandra log, or just logging in the script itself?

Thanks

On Fri, Aug 19, 2011 at 2:30 PM, Peter Schuller
<pe...@infidyne.com> wrote:
>> Is it possible for instance that sometimes your cron job takes longer
>> than five minutes?
>
> Or just a lack of failure handling in the cron job for that matter.
> Are you *SURE* the the "processed" flag truly got set? Do you have a
> log statement (written *AFTER* successful write to Cassandra) that
> indicates the items you specifically say are processed twice, were in
> fact written twice to Cassandra?
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Peter Schuller <pe...@infidyne.com>.
> Is it possible for instance that sometimes your cron job takes longer
> than five minutes?

Or just a lack of failure handling in the cron job for that matter.
Are you *SURE* the the "processed" flag truly got set? Do you have a
log statement (written *AFTER* successful write to Cassandra) that
indicates the items you specifically say are processed twice, were in
fact written twice to Cassandra?

-- 
/ Peter Schuller (@scode on twitter)

Re: Occasionally getting old data back with ConsistencyLevel.ALL

Posted by Jonathan Ellis <jb...@gmail.com>.
There are a lot of people on 0.7 for whom CL is working as advertised.
 Not saying it's impossible that there's a bug, but the odds are
against it.

Is it possible for instance that sometimes your cron job takes longer
than five minutes?

On Thu, Aug 18, 2011 at 9:49 AM, Kyle Gibson
<ky...@frozenonline.com> wrote:
> I am running cassandra 0.7.8. pycassa 1.1.0
>
> Nodes=7, RF=3
>
> This problem started a few months ago and only occurs sporadically.
>
> I receive notifications from paypal's IPN. The IPN data is saved into
> a column family. I add another column for "processed" which is set to
> 0.
>
> Every 5 minutes, a cron script runs and pulls down IPN events that
> haven't been processed. It does some work, and then writes back
> processed to 1.
>
> Usually this worked, but then (when exactly I don't recall),
> occasionally, it started having a problem. Processed wasn't been set
> to 1. So, IPN events would be processed twice.
>
> I tried a few things to fix this. Repair, compact, restarting the
> cluster, upgrading. I even did a complete rebuild of the cluster,
> wiping the data directory and starting fresh on 0.7.8.
>
> I then ditched the "processed" column and decide to use two column
> families. IPNs are put into column family A, and after being
> processed, it is inserted into column family B and deleted from A.
>
> The problem still persisted. At this point I was using CL.QUORUM. So,
> I started using CL.ALL.
>
> And the problem still persists. Having IPN events processed twice
> causes a fair bit of problems, so this is something I really need to
> get resolved.
>
> Thanks,
>
> Kyle
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com