You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roland Gude <ro...@yoochoose.com> on 2011/02/10 11:58:14 UTC

Data ends up in wrong Columnfamily

Hi,

i am experiencing a strange issue. I have two applications writing to Cassandra (in different Column families in the same keyspace). The applications reside on different machines and know nothing about the existence of each other.
The both produce data and write it in Cassandra with batch mutations using hector.
So far so good, but it regularly happens, that data from one application ends up in columnfamilies reserved for the other application as well as the intended columnfamily.

Machine A writes to column family CF_A
Machine B writes to column families CF_B to CF_N

Regularly data that was written (According to my application logs) from Machine A to CF_A ends up in CF_A and in one of the other columnfamilies.

Any ideas why this could be happening?

I am using Cassandra 0.7.0 and hector 0.7.0-23

Greetings,
Roland

--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.gude@yoochoose.com
WWW: www.yoochoose.com<http://www.yoochoose.com/>

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln


AW: Data ends up in wrong Columnfamily

Posted by Roland Gude <ro...@yoochoose.com>.
Yes this could very well be the issue.
As I see its already fixed for 0.7.1. Hopefully it will pass a vote soon.

Thanks,

Roland

-----Ursprüngliche Nachricht-----
Von: scode@scode.org [mailto:scode@scode.org] Im Auftrag von Peter Schuller
Gesendet: Freitag, 11. Februar 2011 09:11
An: user@cassandra.apache.org
Betreff: Re: Data ends up in wrong Columnfamily

> So far so good, but it regularly happens, that data from one application
> ends up in columnfamilies reserved for the other application as well as the
> intended columnfamily.

Maybe https://issues.apache.org/jira/browse/CASSANDRA-1992

-- 
/ Peter Schuller


Re: Data ends up in wrong Columnfamily

Posted by Peter Schuller <pe...@infidyne.com>.
> So far so good, but it regularly happens, that data from one application
> ends up in columnfamilies reserved for the other application as well as the
> intended columnfamily.

Maybe https://issues.apache.org/jira/browse/CASSANDRA-1992

-- 
/ Peter Schuller

AW: Data ends up in wrong Columnfamily

Posted by Roland Gude <ro...@yoochoose.com>.
Hi,

machine A has absolutely no knowledge about the anything about the other application. Not even the columnfamily name.
I was digging into this further:

Since the data I find in the wrong space has a timestamp in its row key It was quite easy to find out that the data was relatively old. Unfortunately from a time where I do not have batch mutation logs from the server side.
I think this might be related to the “deleted columns reappear” thread, as I saw the following happen:


·         I truncated the columnfamily that contained wrong data using the Cassandra-cli.

·         I regenerated the correct data for that columnfamily

·         I ran repair on a node in the cluster

·         -> The data reappeared

I tried this multiple times. And even tried to truncate the columnfamily using clustertool on the slight  chance that it does something different than the cli when truncating. But up to the moment I have not been successful in removing the data from the cluster.
Another strange thing about the issue is, that repair seems to blow up the data indefinetly.
The columnfamily that contains wrong data contains around 200Kb of correct data before I repair. The complete cluster contains around 6Gb of data ( 3 nodes 3Gb each replication factor 2). After repair on one node, that node contains about 14GB of data. If I trigger a repair now on the second node, It gets to around 24Gb of data before it falls to OOM.
Getting to 24Gb of data seems to be impossible to me from the amount of data I have written to the cluster. I can only imagine that it is data that was once deleted but keeps reappering and while doing so, it reappears in the wrong place.
Note that the columnfamily that contains the wrong data did not even exist when the data was first written (It was created with the cli only a couple of days ago, while the oldest row I could find that was not supposed to exist was from January 7th)

We did fail to run repair regulary on that cluster in the meantime.

If I find a BatchMutation log that indicates an incorrect write received by the server, I will post it.

Greetings,

roland
Von: Aaron Morton [mailto:aaron@thelastpickle.com]
Gesendet: Donnerstag, 10. Februar 2011 21:37
An: user@cassandra.apache.org
Betreff: Re: Data ends up in wrong Columnfamily

Not heard of that before, chances are it's a problem in your code. Does machine A even know the other CF name? Can you log the batch mutations you are sending? When it appears in the other CF is the data complete?

There is also a Hector list, perhaps they can help.

Aaron

On 10/02/2011, at 11:58 PM, Roland Gude <ro...@yoochoose.com>> wrote:
Hi,

i am experiencing a strange issue. I have two applications writing to Cassandra (in different Column families in the same keyspace). The applications reside on different machines and know nothing about the existence of each other.
The both produce data and write it in Cassandra with batch mutations using hector.
So far so good, but it regularly happens, that data from one application ends up in columnfamilies reserved for the other application as well as the intended columnfamily.

Machine A writes to column family CF_A
Machine B writes to column families CF_B to CF_N

Regularly data that was written (According to my application logs) from Machine A to CF_A ends up in CF_A and in one of the other columnfamilies.

Any ideas why this could be happening?

I am using Cassandra 0.7.0 and hector 0.7.0-23

Greetings,
Roland

--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.gude@yoochoose.com<ma...@yoochoose.com>
WWW: www.yoochoose.com<http://www.yoochoose.com/>

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln


Re: Data ends up in wrong Columnfamily

Posted by Aaron Morton <aa...@thelastpickle.com>.
Not heard of that before, chances are it's a problem in your code. Does machine A even know the other CF name? Can you log the batch mutations you are sending? When it appears in the other CF is the data complete?

There is also a Hector list, perhaps they can help.

Aaron

On 10/02/2011, at 11:58 PM, Roland Gude <ro...@yoochoose.com> wrote:

> Hi,
> 
>  
> 
> i am experiencing a strange issue. I have two applications writing to Cassandra (in different Column families in the same keyspace). The applications reside on different machines and know nothing about the existence of each other.
> 
> The both produce data and write it in Cassandra with batch mutations using hector.
> 
> So far so good, but it regularly happens, that data from one application ends up in columnfamilies reserved for the other application as well as the intended columnfamily.
> 
>  
> 
> Machine A writes to column family CF_A
> 
> Machine B writes to column families CF_B to CF_N
> 
>  
> 
> Regularly data that was written (According to my application logs) from Machine A to CF_A ends up in CF_A and in one of the other columnfamilies.
> 
>  
> 
> Any ideas why this could be happening?
> 
>  
> 
> I am using Cassandra 0.7.0 and hector 0.7.0-23
> 
>  
> 
> Greetings,
> 
> Roland
> 
>  
> 
> --
> 
> YOOCHOOSE GmbH
> 
>  
> 
> Roland Gude
> 
> Software Engineer
> 
>  
> 
> Im Mediapark 8, 50670 Köln
> 
>  
> 
> +49 221 4544151 (Tel)
> 
> +49 221 4544159 (Fax)
> 
> +49 171 7894057 (Mobil)
> 
>  
> 
>  
> 
> Email: roland.gude@yoochoose.com
> 
> WWW: www.yoochoose.com
> 
>  
> 
> YOOCHOOSE GmbH
> 
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
> 
> Handelsregister: Amtsgericht Köln HRB 65275
> 
> Ust-Ident-Nr: DE 264 773 520
> 
> Sitz der Gesellschaft: Köln
> 
>