You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Maifi Khan <ma...@gmail.com> on 2010/08/06 21:51:02 UTC

one question about cassandra write

Hi
I have a question about the internal of cassandra write.
Say, I already have the following in the database -
(row_x,col_y,val1)

Now if I try to insert
(row_x,col_y,val100), what will happen?
Will it overwrite the old data?
I mean, will it overwrite the data physically or will it keep both the
old version and the new version of the data?
If the later is the case, can I retrieve the old version of the data?

One sample example where I may need this is as follows -
say I want to log the stock price of a particular company over time.
Say company name is the row key and stock price is the column. Then
the stock price column needs to be written repeatedly.
HTH

thanks
Maifi

Re: one question about cassandra write

Posted by Peter Schuller <pe...@infidyne.com>.
> a) It is a major compaction [1]
> b) The old version was deleted/overwritten more than GCGraceSeconds ago [2]

c) and the memtable containing the delete/overwrite has been flushed.

(I suppose that's kinda obvious in retrospect, but it took me a little
bit to realize this was why a 'nodetool compact' was not clearing up
diskspace after bulk deletion even with GCGraceSeconds set to 0.)

-- 
/ Peter Schuller

Re: one question about cassandra write

Posted by Rob Coli <rc...@digg.com>.
On 8/6/10 2:13 PM, Benjamin Black wrote:
> Assuming the old version is already on disk in an SSTable, the new
> version will not overwrite it, and both versions will be in the
> system.  A compaction will remove the old version, however.

To be clear, a compaction will only remove the old version if :

a) It is a major compaction [1]
b) The old version was deleted/overwritten more than GCGraceSeconds ago [2]

=Rob

[1] https://issues.apache.org/jira/browse/CASSANDRA-1074
[2] http://wiki.apache.org/cassandra/DistributedDeletes

Re: one question about cassandra write

Posted by Benjamin Black <b...@b3k.us>.
On Fri, Aug 6, 2010 at 12:51 PM, Maifi Khan <ma...@gmail.com> wrote:
> Hi
> I have a question about the internal of cassandra write.
> Say, I already have the following in the database -
> (row_x,col_y,val1)
>
> Now if I try to insert
> (row_x,col_y,val100), what will happen?
> Will it overwrite the old data?
> I mean, will it overwrite the data physically or will it keep both the
> old version and the new version of the data?

Assuming the old version is already on disk in an SSTable, the new
version will not overwrite it, and both versions will be in the
system.  A compaction will remove the old version, however.

> If the later is the case, can I retrieve the old version of the data?
>

No.  And no, there is no plan to add that functionality.  If it is
needed it is simple to emulate in a variety of ways with the current
feature set.

This is recommended reading:
http://maxgrinev.com/2010/07/12/update-idempotency-why-it-is-important-in-cassandra-applications-2/


b

RE: one question about cassandra write

Posted by Jeremiah Jordan <JE...@morningstar.com>.
If you want to be able to get the data over time, you need to store it
in multiple columns.  You can use TimeUUID columns if you need to be
able to get ranges of times through queries.

-----Original Message-----
From: Maifi Khan [mailto:maifi.khan@gmail.com] 
Sent: Friday, August 06, 2010 2:51 PM
To: user@cassandra.apache.org
Subject: one question about cassandra write

Hi
I have a question about the internal of cassandra write.
Say, I already have the following in the database -
(row_x,col_y,val1)

Now if I try to insert
(row_x,col_y,val100), what will happen?
Will it overwrite the old data?
I mean, will it overwrite the data physically or will it keep both the
old version and the new version of the data?
If the later is the case, can I retrieve the old version of the data?

One sample example where I may need this is as follows -
say I want to log the stock price of a particular company over time.
Say company name is the row key and stock price is the column. Then
the stock price column needs to be written repeatedly.
HTH

thanks
Maifi