You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by java8964 java8964 <ja...@hotmail.com> on 2013/09/17 03:51:52 UTC

questions related to the SSTable file

Hi, I have some questions related to the SSTable in the Cassandra, as I am doing a project to use it and hope someone in this list can share some thoughts.
My understand is the SSTable is per column family. But each column family could have multi SSTable files. During the runtime, one row COULD split into more than one SSTable file, even this is not good for performance, but it does happen, and Cassandra will try to merge and store one row data into one SSTable file during compassion.
The question is when one row is split in multi SSTable files, what is the boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I run sstable2json tool to run on both SSTable files individually:
1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:
For column family A:at Time 0, one row key (key1) has some data. It is being stored and back up in SSTable file 1.at Time 1, if any column for key1 has any change (a new column insert, a column updated/deleted, or even whole row being deleted), I will expect this whole row exists in the any incremental backup SSTable files after time 1, right?
What happen if the above row just happen to store in more than one SSTable file?at Time 0, one row key (key1) has some data, and it just is stored in SSTable file1 and file2, and being backup.at Time 1, if one column is added in row key1, and the change in fact will happen in SSTable file2 only in this case, and if we do a incremental backup after that, what SSTable files should I expect in this backup? Both SSTable files? Or Just SSTable file 2?
I was thinking incremental backup SSTable files are good candidate for catching data being changed, but as one row data could exist in multi SSTable file makes thing complex now. Did anyone have any experience to use SSTable file in this way? What are the lessons?
Thanks
Yong 		 	   		  

Re: questions related to the SSTable file

Posted by Shahab Yunus <sh...@gmail.com>.
Thanks Robert for the answer. It makes sense. If that happens then it means
that your design or use case needs some rework ;)

Regards,
Shahab


On Tue, Sep 17, 2013 at 2:37 PM, java8964 java8964 <ja...@hotmail.com>wrote:

> Another question related to the SSTable files generated in the incremental
> backup is not really ONLY incremental delta, right? It will include more
> than delta in the SSTable files.
>
> I will use the example to show my question:
>
> first, we have this data in the SSTable file 1:
>
> rowkey(1), columns (maker=honda).
>
> later, if we add one column in the same key:
>
> rowkey(1), columns (maker=honda, color=blue)
>
> The data above being flushed to another SSTable file 2. In this case, it
> will be part of the incremental backup at this time. But in fact, it will
> contain both old data (make=honda), plus new changes (color=blue).
>
> So in fact, incremental backup of Cassandra is just hard link all the new
> SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Thanks
>
> Yong
>
> > From: Dean.Hiller@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 08:11:36 -0600
> > Subject: Re: questions related to the SSTable file
> >
> > Netflix created file streaming in astyanax into cassandra specifically
> because writing too big a column cell is a bad thing. The limit is really
> dependent on use case….do you have servers writing 1000's of 200Meg files
> at the same time….if so, astyanax streaming may be a better way to go there
> where it divides up the file amongst cells and rows.
> >
> > I know the limit of a row size is really your hard disk space and the
> column count if I remember goes into billions though realistically, I think
> beyond 10 million might slow down a bit….all I know is we tested up to 10
> million columns with no issues in our use-case.
> >
> > So you mean at this time, I could get 2 SSTable files, both contain
> column "Blue" for the same row key, right?
> >
> > Yes
> >
> > In this case, I should be fine as value of the "Blue" column contain the
> timestamp to help me to find out which is the last change, right?
> >
> > Yes
> >
> > In MR world, each file COULD be processed by different Mapper, but will
> be sent to the same reducer as both data will be shared same key.
> >
> > If that is the way you are writing it, then yes
> >
> > Dean
> >
> > From: Shahab Yunus <shahab.yunus@gmail.com<mailto:shahab.yunus@gmail.com
> >>
> > Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> <us...@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:54 AM
> > To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> > Subject: Re: questions related to the SSTable file
> >
> > derstand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is
>

Re: questions related to the SSTable file

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Sep 17, 2013 at 6:51 PM, java8964 java8964 <ja...@hotmail.com>wrote:

> I thought I was clearer, but your clarification confused me again.
>


> But there is no way we can be sure that these SSTable files will ONLY
> contain modified data. So the statement being quoted above is not exactly
> right. I agree that all the modified data in that period will be in the
> incremental sstable files, but a lot of other unmodified data will be in
> them too.
>

The incremental backup directory only includes SSTables recently flushed
from memtables. It does not include SSTables created as a result of
compaction.

Memtables, by definition, only contain modified or new data. Yes, there is
one new copy per replica and the ones processed after the first might
appear "unmodified", which may be what you are talking about?

=Rob

Re: questions related to the SSTable file

Posted by Takenori Sato <ts...@cloudian.com>.
Yong,

It seems there is still a misunderstanding.

> But there is no way we can be sure that these SSTable files will ONLY
contain modified data. So the statement being quoted above is not exactly
right. I agree that all the modified data in that period will be in the
incremental sstable files, but a lot of other unmodified data will be in
them too.

memtable(a new sstable) contains only modified data as I explained by the
example.

> If we have 2 rows data with different row key in the same memtable, and
if only 2nd row being modified. When the memtable is flushed to SSTable
file, it will contain both rows, and both will be in the incremental backup
files. So for first row, nothing change, but it will be in the incremental
backup.

Unless the first row is modified, it does not exist in memtable at all.

> If I have one row with one column, now a new column is added, and whole
row in one memtable being flushed to SSTable file, as also in this
incremental backup. For first column, nothing change, but it will still be
in incremental backup file.

For example, if it works as you understand, then, Color-2 should contain
two more rows, Lavender, and Blue with an existing column, hex, like the
following. But it's not.

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]

--> your understanding
- Color-2-Data.db: [{Lavender: {hex: #E6E6FA}}, {Green: {hex: #008000}},
{Blue: {hex: #0000FF}, {hex2: #2c86ff}}]
* Row, Lavender, and Column Blue's hex have no changes


> The point I tried to make is this is important if I design an ETL to
consume the incremental backup SSTable files. As above example, I have to
realize that in the incremental backup sstable files, they could or most
likely contain old data which was previous being processed already. That
will require additional logic and responsibility in the ETL to handle it,
or any outsider SSTable consumer to pay attention to it.

I suggest to try org.apache.cassandra.tools.SSTableExport, then you will
see what's going on under the hood.

- Takenori








On Wed, Sep 18, 2013 at 10:51 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Quote:
>
> "
> To be clear, "incremental backup" feature backs up the data being modified
> in that period, because it writes only those files to the incremental
> backup dir as hard links, between full snapshots.
> "
>
> I thought I was clearer, but your clarification confused me again.
> My understanding so far from all the answer I got so far, I believe, the
> more accurate statement of "incremental backup" should be "incremental
> backup" feature backs up the SSTable files being generated in that period.
>
> But there is no way we can be sure that these SSTable files will ONLY
> contain modified data. So the statement being quoted above is not exactly
> right. I agree that all the modified data in that period will be in the
> incremental sstable files, but a lot of other unmodified data will be in
> them too.
>
> If we have 2 rows data with different row key in the same memtable, and if
> only 2nd row being modified. When the memtable is flushed to SSTable file,
> it will contain both rows, and both will be in the incremental backup
> files. So for first row, nothing change, but it will be in the incremental
> backup.
>
> If I have one row with one column, now a new column is added, and whole
> row in one memtable being flushed to SSTable file, as also in this
> incremental backup. For first column, nothing change, but it will still be
> in incremental backup file.
>
> The point I tried to make is this is important if I design an ETL to
> consume the incremental backup SSTable files. As above example, I have to
> realize that in the incremental backup sstable files, they could or most
> likely contain old data which was previous being processed already. That
> will require additional logic and responsibility in the ETL to handle it,
> or any outsider SSTable consumer to pay attention to it.
>
> Yong
>
> ------------------------------
> Date: Tue, 17 Sep 2013 18:01:45 -0700
>
> Subject: Re: questions related to the SSTable file
> From: rcoli@eventbrite.com
> To: user@cassandra.apache.org
>
>
> On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato <ts...@cloudian.com> wrote:
>
> > So in fact, incremental backup of Cassandra is just hard link all the
> new SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Correct.
>
> But over time, some old enough SSTable files are usually shared across
> multiple snapshots.
>
>
> To be clear, "incremental backup" feature backs up the data being modified
> in that period, because it writes only those files to the incremental
> backup dir as hard links, between full snapshots.
>
> http://www.datastax.com/docs/1.0/operations/backup_restore
> "
> When incremental backups are enabled (disabled by default), Cassandra
> hard-links each flushed SSTable to a backups directory under the keyspace
> data directory. This allows you to store backups offsite without
> transferring entire snapshots. Also, incremental backups combine with
> snapshots to provide a dependable, up-to-date backup mechanism.
> "
>
> What Takenori is referring to is that a full snapshot is in some ways an
> "incremental backup" because it shares hard linked SSTables with other
> snapshots.
>
> =Rob
>
>

RE: questions related to the SSTable file

Posted by java8964 java8964 <ja...@hotmail.com>.
Quote: 
"
To be clear, "incremental backup" feature backs up the data being modified in that period, because it writes only those files to the incremental backup dir as hard links, between full snapshots."
I thought I was clearer, but your clarification confused me again.My understanding so far from all the answer I got so far, I believe, the more accurate statement of "incremental backup" should be "incremental backup" feature backs up the SSTable files being generated in that period. 
But there is no way we can be sure that these SSTable files will ONLY contain modified data. So the statement being quoted above is not exactly right. I agree that all the modified data in that period will be in the incremental sstable files, but a lot of other unmodified data will be in them too.
If we have 2 rows data with different row key in the same memtable, and if only 2nd row being modified. When the memtable is flushed to SSTable file, it will contain both rows, and both will be in the incremental backup files. So for first row, nothing change, but it will be in the incremental backup.
If I have one row with one column, now a new column is added, and whole row in one memtable being flushed to SSTable file, as also in this incremental backup. For first column, nothing change, but it will still be in incremental backup file.
The point I tried to make is this is important if I design an ETL to consume the incremental backup SSTable files. As above example, I have to realize that in the incremental backup sstable files, they could or most likely contain old data which was previous being processed already. That will require additional logic and responsibility in the ETL to handle it, or any outsider SSTable consumer to pay attention to it.
Yong
Date: Tue, 17 Sep 2013 18:01:45 -0700
Subject: Re: questions related to the SSTable file
From: rcoli@eventbrite.com
To: user@cassandra.apache.org

On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato <ts...@cloudian.com> wrote:

> So in fact, incremental backup of Cassandra is just hard link all the new SSTable files being generated during the incremental backup period. It could contain any data, not just the data being update/insert/delete in this period, correct?


Correct.
But over time, some old enough SSTable files are usually shared across multiple snapshots. 

To be clear, "incremental backup" feature backs up the data being modified in that period, because it writes only those files to the incremental backup dir as hard links, between full snapshots.

http://www.datastax.com/docs/1.0/operations/backup_restore
"When incremental backups are enabled (disabled by default), Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory. This allows you to store backups offsite without transferring entire snapshots. Also, incremental backups combine with snapshots to provide a dependable, up-to-date backup mechanism.
"

What Takenori is referring to is that a full snapshot is in some ways an "incremental backup" because it shares hard linked SSTables with other snapshots.

=Rob  		 	   		  

Re: questions related to the SSTable file

Posted by "Takenori Sato(Cloudian)" <ts...@cloudian.com>.
Thanks, Rob, for clarifying!

- Takenori

(2013/09/18 10:01), Robert Coli wrote:
> On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato <tsato@cloudian.com 
> <ma...@cloudian.com>> wrote:
>
>     > So in fact, incremental backup of Cassandra is just hard link
>     all the new SSTable files being generated during the incremental
>     backup period. It could contain any data, not just the data being
>     update/insert/delete in this period, correct?
>
>     Correct.
>
>     But over time, some old enough SSTable files are usually shared
>     across multiple snapshots.
>
>
> To be clear, "incremental backup" feature backs up the data being 
> modified in that period, because it writes only those files to the 
> incremental backup dir as hard links, between full snapshots.
>
> http://www.datastax.com/docs/1.0/operations/backup_restore
> "
> When incremental backups are enabled (disabled by default), Cassandra 
> hard-links each flushed SSTable to a backups directory under the 
> keyspace data directory. This allows you to store backups offsite 
> without transferring entire snapshots. Also, incremental backups 
> combine with snapshots to provide a dependable, up-to-date backup 
> mechanism.
> "
>
> What Takenori is referring to is that a full snapshot is in some ways 
> an "incremental backup" because it shares hard linked SSTables with 
> other snapshots.
>
> =Rob


Re: questions related to the SSTable file

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Sep 17, 2013 at 5:46 PM, Takenori Sato <ts...@cloudian.com> wrote:

> > So in fact, incremental backup of Cassandra is just hard link all the
> new SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Correct.
>
> But over time, some old enough SSTable files are usually shared across
> multiple snapshots.
>

To be clear, "incremental backup" feature backs up the data being modified
in that period, because it writes only those files to the incremental
backup dir as hard links, between full snapshots.

http://www.datastax.com/docs/1.0/operations/backup_restore
"
When incremental backups are enabled (disabled by default), Cassandra
hard-links each flushed SSTable to a backups directory under the keyspace
data directory. This allows you to store backups offsite without
transferring entire snapshots. Also, incremental backups combine with
snapshots to provide a dependable, up-to-date backup mechanism.
"

What Takenori is referring to is that a full snapshot is in some ways an
"incremental backup" because it shares hard linked SSTables with other
snapshots.

=Rob

Re: questions related to the SSTable file

Posted by Takenori Sato <ts...@cloudian.com>.
> So in fact, incremental backup of Cassandra is just hard link all the new
SSTable files being generated during the incremental backup period. It
could contain any data, not just the data being update/insert/delete in
this period, correct?

Correct.

But over time, some old enough SSTable files are usually shared across
multiple snapshots.


On Wed, Sep 18, 2013 at 3:37 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Another question related to the SSTable files generated in the incremental
> backup is not really ONLY incremental delta, right? It will include more
> than delta in the SSTable files.
>
> I will use the example to show my question:
>
> first, we have this data in the SSTable file 1:
>
> rowkey(1), columns (maker=honda).
>
> later, if we add one column in the same key:
>
> rowkey(1), columns (maker=honda, color=blue)
>
> The data above being flushed to another SSTable file 2. In this case, it
> will be part of the incremental backup at this time. But in fact, it will
> contain both old data (make=honda), plus new changes (color=blue).
>
> So in fact, incremental backup of Cassandra is just hard link all the new
> SSTable files being generated during the incremental backup period. It
> could contain any data, not just the data being update/insert/delete in
> this period, correct?
>
> Thanks
>
> Yong
>
> > From: Dean.Hiller@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 08:11:36 -0600
>
> > Subject: Re: questions related to the SSTable file
> >
> > Netflix created file streaming in astyanax into cassandra specifically
> because writing too big a column cell is a bad thing. The limit is really
> dependent on use case….do you have servers writing 1000's of 200Meg files
> at the same time….if so, astyanax streaming may be a better way to go there
> where it divides up the file amongst cells and rows.
> >
> > I know the limit of a row size is really your hard disk space and the
> column count if I remember goes into billions though realistically, I think
> beyond 10 million might slow down a bit….all I know is we tested up to 10
> million columns with no issues in our use-case.
> >
> > So you mean at this time, I could get 2 SSTable files, both contain
> column "Blue" for the same row key, right?
> >
> > Yes
> >
> > In this case, I should be fine as value of the "Blue" column contain the
> timestamp to help me to find out which is the last change, right?
> >
> > Yes
> >
> > In MR world, each file COULD be processed by different Mapper, but will
> be sent to the same reducer as both data will be shared same key.
> >
> > If that is the way you are writing it, then yes
> >
> > Dean
> >
> > From: Shahab Yunus <shahab.yunus@gmail.com<mailto:shahab.yunus@gmail.com
> >>
> > Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> <us...@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:54 AM
> > To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> > Subject: Re: questions related to the SSTable file
> >
> > derstand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is
>

RE: questions related to the SSTable file

Posted by java8964 java8964 <ja...@hotmail.com>.
Another question related to the SSTable files generated in the incremental backup is not really ONLY incremental delta, right? It will include more than delta in the SSTable files.
I will use the example to show my question:
first, we have this data in the SSTable file 1:
rowkey(1), columns (maker=honda).
later, if we add one column in the same key:
rowkey(1), columns (maker=honda, color=blue)
The data above being flushed to another SSTable file 2. In this case, it will be part of the incremental backup at this time. But in fact, it will contain both old data (make=honda), plus new changes (color=blue).
So in fact, incremental backup of Cassandra is just hard link all the new SSTable files being generated during the incremental backup period. It could contain any data, not just the data being update/insert/delete in this period, correct?
Thanks
Yong

> From: Dean.Hiller@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 08:11:36 -0600
> Subject: Re: questions related to the SSTable file
> 
> Netflix created file streaming in astyanax into cassandra specifically because writing too big a column cell is a bad thing.  The limit is really dependent on use case….do you have servers writing 1000's of 200Meg files at the same time….if so, astyanax streaming may be a better way to go there where it divides up the file amongst cells and rows.
> 
> I know the limit of a row size is really your hard disk space and the column count if I remember goes into billions though realistically, I think beyond 10 million might slow down a bit….all I know is we tested up to 10 million columns with no issues in our use-case.
> 
> So you mean at this time, I could get 2 SSTable files, both contain column "Blue" for the same row key, right?
> 
> Yes
> 
> In this case, I should be fine as value of the "Blue" column contain the timestamp to help me to find out which is the last change, right?
> 
> Yes
> 
> In MR world, each file COULD be processed by different Mapper, but will be sent to the same reducer as both data will be shared same key.
> 
> If that is the way you are writing it, then yes
> 
> Dean
> 
> From: Shahab Yunus <sh...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Tuesday, September 17, 2013 7:54 AM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Re: questions related to the SSTable file
> 
> derstand if following changes apply to the same row key as above example, additional SSTable file could be generated. That is
 		 	   		  

Re: questions related to the SSTable file

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Sep 17, 2013 at 6:54 AM, Shahab Yunus <sh...@gmail.com>wrote:

> java8964, basically are you asking that what will happen if we put large
> amount of data in one column of one row at once? How will this blob of data
> representing one column and one row i.e. cell will be split into multiple
> SSTable? Or in such particular cases it will always be one extra large
> SSTable? I am also interesting in knowing the answer.
>

A memtable is flushed to a single SSTable at whatever size it is as a
memtable. You cannot have a memtable larger than (a portion of) your JVM
heap.

=Rob

Re: questions related to the SSTable file

Posted by "Hiller, Dean" <De...@nrel.gov>.
Netflix created file streaming in astyanax into cassandra specifically because writing too big a column cell is a bad thing.  The limit is really dependent on use case….do you have servers writing 1000's of 200Meg files at the same time….if so, astyanax streaming may be a better way to go there where it divides up the file amongst cells and rows.

I know the limit of a row size is really your hard disk space and the column count if I remember goes into billions though realistically, I think beyond 10 million might slow down a bit….all I know is we tested up to 10 million columns with no issues in our use-case.

So you mean at this time, I could get 2 SSTable files, both contain column "Blue" for the same row key, right?

Yes

In this case, I should be fine as value of the "Blue" column contain the timestamp to help me to find out which is the last change, right?

Yes

In MR world, each file COULD be processed by different Mapper, but will be sent to the same reducer as both data will be shared same key.

If that is the way you are writing it, then yes

Dean

From: Shahab Yunus <sh...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, September 17, 2013 7:54 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: questions related to the SSTable file

derstand if following changes apply to the same row key as above example, additional SSTable file could be generated. That is

Re: questions related to the SSTable file

Posted by Shahab Yunus <sh...@gmail.com>.
java8964, basically are you asking that what will happen if we put large
amount of data in one column of one row at once? How will this blob of data
representing one column and one row i.e. cell will be split into multiple
SSTable? Or in such particular cases it will always be one extra large
SSTable? I am also interesting in knowing the answer.

Regards,
Shahab


On Tue, Sep 17, 2013 at 9:50 AM, java8964 java8964 <ja...@hotmail.com>wrote:

> Thanks Dean for clarification.
>
> But if I put hundreds of megabyte data of one row through one put, what
> you mean is Cassandra will put all of them into one SSTable, even the data
> is very big, right? Let's assume in this case the Memtables in memory
> reaches its limit by this change.
> What I want to know is if there is possibility 2 SSTables be generated in
> above case, what is the boundary.
>
> I understand if following changes apply to the same row key as above
> example, additional SSTable file could be generated. That is clear for me.
>
> Yong
>
> > From: Dean.Hiller@nrel.gov
> > To: user@cassandra.apache.org
> > Date: Tue, 17 Sep 2013 07:39:48 -0600
> > Subject: Re: questions related to the SSTable file
> >
> > You have to first understand the rules of
> >
> > 1. Sstables are immutable so Color-1-Data.db will not be modified and
> only deleted once compacted
> > 2. Memtables are flushed when reaching a limit so if Blue:{hex} is
> modified, it is done in the in-memory memtable that is eventually flushed
> > 3. Once flushed, it is an SSTable on disk and you have two values for
> "hex" both with two timestamps so we know which one is the current value
> >
> > When it finally compacts, the old value can go away.
> >
> > Dean
> >
> > From: java8964 java8964 <java8964@hotmail.com<mailto:
> java8964@hotmail.com>>
> > Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> <us...@cassandra.apache.org>>
> > Date: Tuesday, September 17, 2013 7:32 AM
> > To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <
> user@cassandra.apache.org<ma...@cassandra.apache.org>>
> > Subject: RE: questions related to the SSTable file
> >
> > Hi, Takenori:
> >
> > Thanks for your quick reply. Your explain is clear for me understanding
> what compaction mean, and I also can understand now same row key will exist
> in multi SSTable file.
> >
> > But beyond that, I want to know what happen if one row data is too large
> to put in one SSTable file. In your example, the same row exist in multi
> SSTable files as it is keeping changing and flushing into the disk at
> runtime. That's fine, in this case, in every SSTable file of the 4, there
> is no single file contains whole data of that row, but each one does
> contain full picture of individual unit ( I don't know what I should call
> this unit, but it will be larger than one column, right?). Just in your
> example, there is no way in any time, we could have SSTable files like
> following, right:
> >
> > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000}}]
> > - Color-1-Data_1.db: [{Blue: {hex:FF}}]
> > - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> > - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}},
> {Blue: {}}]
> > - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> >
> > I don't see any reason Cassandra will ever do that, but just want to
> confirm, as your 'no' answer to my 2 question is confusion.
> >
> > Another question from my originally email, even though I may get the
> answer already from your example, but just want to confirm it.
> > Just use your example, let's say after the first 2 steps:
> >
> > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
> > - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> > There is a incremental backup. After that, there is following changes
> coming:
> >
> > - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> > - Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
> > - Delete a row of (key = Blue)
> > ---- memtable is flushed => Color-3-Data.db ----
> > Another incremental backup right now.
> >
> > Now in this case, my assumption is only Color-3-Data.db will be in this
> backup, right? Even though Color-1-Data.db and Color-2-Data.db contains the
> data of the same row key as Color-3-Data.db, but from a incremental backup
> point of view, only Color-3-Data.db will be stored.
> >
> > The reason I asked those question is that I am thinking to use MapReduce
> jobs to parse the incremental backup files, and rebuild the snapshot in
> Hadoop side. Of course, the column families I am doing is pure Fact data.
> So there is delete/update in Cassandra for these kind of data, just
> appending. But it is still important for me to understand the SSTable
> file's content.
> >
> > Thanks
> >
> > Yong
> >
> >
> > ________________________________
> > Date: Tue, 17 Sep 2013 11:12:01 +0900
> > From: tsato@cloudian.com<ma...@cloudian.com>
> > To: user@cassandra.apache.org<ma...@cassandra.apache.org>
> > Subject: Re: questions related to the SSTable file
> >
> > Hi,
> >
> > > 1) I will expect same row key could show up in both sstable2json
> output, as this one row exists in both SSTable files, right?
> >
> > Yes.
> >
> > > 2) If so, what is the boundary? Will Cassandra guarantee the column
> level as the boundary? What I mean is that for one column's data, it will
> be guaranteed to be either in the first file, or 2nd file, right? There is
> no chance that Cassandra will cut the data of one column into 2 part, and
> one part stored in first SSTable file, and the other part stored in second
> SSTable file. Is my understanding correct?
> >
> > No.
> >
> > > 3) If what we are talking about are only the SSTable files in
> snapshot, incremental backup SSTable files, exclude the runtime SSTable
> files, will anything change? For snapshot or incremental backup SSTable
> files, first can one row data still may exist in more than one SSTable
> file? And any boundary change in this case?
> > > 4) If I want to use incremental backup SSTable files as the way to
> catch data being changed, is it a good way to do what I try to archive? In
> this case, what happen in the following example:
> >
> > I don't fully understand, but snapshot will do. It will create hard
> links to all the SSTable files present at snapshot.
> >
> >
> > Let me explain how SSTable and compaction works.
> >
> > Suppose we have 4 files being compacted(the last one has bee just
> flushed, then which triggered compaction). Note that file names are
> simplified.
> >
> > - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
> > - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> > - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}},
> {Blue: {}}]
> > - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> >
> > They are created by the following operations.
> >
> > - Add a row of (key, column, column_value = Blue, hex, #0000FF)
> > - Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
> > ---- memtable is flushed => Color-1-Data.db ----
> > - Add a row of (key, column, column_value = Green, hex, #008000)
> > - Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
> > ---- memtable is flushed => Color-2-Data.db ----
> > - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> > - Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
> > - Delete a row of (key = Blue)
> > ---- memtable is flushed => Color-3-Data.db ----
> > - Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
> > - Add a row of (key, column, column_value = Gold, hex, #FFD700)
> > ---- memtable is flushed => Color-4-Data.db ----
> >
> > Then, a compaction will merge all those fragments together into the
> latest ones as follows.
> >
> > - Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00FFFF},
> {Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold:
> {hex: #FFD700}}]
> > * assuming RandomPartitioner is used
> >
> > Hope they would help.
> >
> > - Takenori
> >
> > (2013/09/17 10:51), java8964 java8964 wrote:
> > Hi, I have some questions related to the SSTable in the Cassandra, as I
> am doing a project to use it and hope someone in this list can share some
> thoughts.
> >
> > My understand is the SSTable is per column family. But each column
> family could have multi SSTable files. During the runtime, one row COULD
> split into more than one SSTable file, even this is not good for
> performance, but it does happen, and Cassandra will try to merge and store
> one row data into one SSTable file during compassion.
> >
> > The question is when one row is split in multi SSTable files, what is
> the boundary? Or let me ask this way, if one row exists in 2 SSTable files,
> if I run sstable2json tool to run on both SSTable files individually:
> >
> > 1) I will expect same row key could show up in both sstable2json output,
> as this one row exists in both SSTable files, right?
> > 2) If so, what is the boundary? Will Cassandra guarantee the column
> level as the boundary? What I mean is that for one column's data, it will
> be guaranteed to be either in the first file, or 2nd file, right? There is
> no chance that Cassandra will cut the data of one column into 2 part, and
> one part stored in first SSTable file, and the other part stored in second
> SSTable file. Is my understanding correct?
> > 3) If what we are talking about are only the SSTable files in snapshot,
> incremental backup SSTable files, exclude the runtime SSTable files, will
> anything change? For snapshot or incremental backup SSTable files, first
> can one row data still may exist in more than one SSTable file? And any
> boundary change in this case?
> > 4) If I want to use incremental backup SSTable files as the way to catch
> data being changed, is it a good way to do what I try to archive? In this
> case, what happen in the following example:
> >
> > For column family A:
> > at Time 0, one row key (key1) has some data. It is being stored and back
> up in SSTable file 1.
> > at Time 1, if any column for key1 has any change (a new column insert, a
> column updated/deleted, or even whole row being deleted), I will expect
> this whole row exists in the any incremental backup SSTable files after
> time 1, right?
> >
> > What happen if the above row just happen to store in more than one
> SSTable file?
> > at Time 0, one row key (key1) has some data, and it just is stored in
> SSTable file1 and file2, and being backup.
> > at Time 1, if one column is added in row key1, and the change in fact
> will happen in SSTable file2 only in this case, and if we do a incremental
> backup after that, what SSTable files should I expect in this backup? Both
> SSTable files? Or Just SSTable file 2?
> >
> > I was thinking incremental backup SSTable files are good candidate for
> catching data being changed, but as one row data could exist in multi
> SSTable file makes thing complex now. Did anyone have any experience to use
> SSTable file in this way? What are the lessons?
> >
> > Thanks
> >
> > Yong
> >
>

RE: questions related to the SSTable file

Posted by java8964 java8964 <ja...@hotmail.com>.
Thanks Dean for clarification.
But if I put hundreds of megabyte data of one row through one put, what you mean is Cassandra will put all of them into one SSTable, even the data is very big, right? Let's assume in this case the Memtables in memory reaches its limit by this change.What I want to know is if there is possibility 2 SSTables be generated in above case, what is the boundary.
I understand if following changes apply to the same row key as above example, additional SSTable file could be generated. That is clear for me.
Yong

> From: Dean.Hiller@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 07:39:48 -0600
> Subject: Re: questions related to the SSTable file
> 
> You have to first understand the rules of
> 
>  1.  Sstables are immutable so Color-1-Data.db will not be modified and only deleted once compacted
>  2.  Memtables are flushed when reaching a limit so if Blue:{hex} is modified, it is done in the in-memory memtable that is eventually flushed
>  3.  Once flushed, it is an SSTable on disk and you have two values for "hex" both with two timestamps so we know which one is the current value
> 
> When it finally compacts, the old value can go away.
> 
> Dean
> 
> From: java8964 java8964 <ja...@hotmail.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Tuesday, September 17, 2013 7:32 AM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: RE: questions related to the SSTable file
> 
> Hi, Takenori:
> 
> Thanks for your quick reply. Your explain is clear for me understanding what compaction mean, and I also can understand now same row key will exist in multi SSTable file.
> 
> But beyond that, I want to know what happen if one row data is too large to put in one SSTable file. In your example, the same row exist in multi SSTable files as it is keeping changing and flushing into the disk at runtime. That's fine, in this case, in every SSTable file of the 4, there is no single file contains whole data of that row, but each one does contain full picture of individual unit ( I don't know what I should call this unit, but it will be larger than one column, right?). Just in your example, there is no way in any time, we could have SSTable files like following, right:
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000}}]
> - Color-1-Data_1.db:  [{Blue: {hex:FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
> - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> 
> I don't see any reason Cassandra will ever do that, but just want to confirm, as your 'no' answer to my 2 question is confusion.
> 
> Another question from my originally email, even though I may get the answer already from your example, but just want to confirm it.
> Just use your example, let's say after the first 2 steps:
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> There is a incremental backup. After that, there is following changes coming:
> 
> - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> - Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
> - Delete a row of (key = Blue)
> ---- memtable is flushed => Color-3-Data.db ----
> Another incremental backup right now.
> 
> Now in this case, my assumption is only Color-3-Data.db will be in this backup, right? Even though Color-1-Data.db and Color-2-Data.db contains the data of the same row key as Color-3-Data.db, but from a incremental backup point of view, only Color-3-Data.db will be stored.
> 
> The reason I asked those question is that I am thinking to use MapReduce jobs to parse the incremental backup files, and rebuild the snapshot in Hadoop side. Of course, the column families I am doing is pure Fact data. So there is delete/update in Cassandra for these kind of data, just appending. But it is still important for me to understand the SSTable file's content.
> 
> Thanks
> 
> Yong
> 
> 
> ________________________________
> Date: Tue, 17 Sep 2013 11:12:01 +0900
> From: tsato@cloudian.com<ma...@cloudian.com>
> To: user@cassandra.apache.org<ma...@cassandra.apache.org>
> Subject: Re: questions related to the SSTable file
> 
> Hi,
> 
> > 1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?
> 
> Yes.
> 
> > 2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?
> 
> No.
> 
> > 3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
> > 4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:
> 
> I don't fully understand, but snapshot will do. It will create hard links to all the SSTable files present at snapshot.
> 
> 
> Let me explain how SSTable and compaction works.
> 
> Suppose we have 4 files being compacted(the last one has bee just flushed, then which triggered compaction). Note that file names are simplified.
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
> - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> 
> They are created by the following operations.
> 
> - Add a row of (key, column, column_value = Blue, hex, #0000FF)
> - Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
> ---- memtable is flushed => Color-1-Data.db ----
> - Add a row of (key, column, column_value = Green, hex, #008000)
> - Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
> ---- memtable is flushed => Color-2-Data.db ----
> - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> - Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
> - Delete a row of (key = Blue)
> ---- memtable is flushed => Color-3-Data.db ----
> - Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
> - Add a row of (key, column, column_value = Gold, hex, #FFD700)
> ---- memtable is flushed => Color-4-Data.db ----
> 
> Then, a compaction will merge all those fragments together into the latest ones as follows.
> 
> - Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00FFFF}, {Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> * assuming RandomPartitioner is used
> 
> Hope they would help.
> 
> - Takenori
> 
> (2013/09/17 10:51), java8964 java8964 wrote:
> Hi, I have some questions related to the SSTable in the Cassandra, as I am doing a project to use it and hope someone in this list can share some thoughts.
> 
> My understand is the SSTable is per column family. But each column family could have multi SSTable files. During the runtime, one row COULD split into more than one SSTable file, even this is not good for performance, but it does happen, and Cassandra will try to merge and store one row data into one SSTable file during compassion.
> 
> The question is when one row is split in multi SSTable files, what is the boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I run sstable2json tool to run on both SSTable files individually:
> 
> 1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?
> 2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?
> 3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
> 4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:
> 
> For column family A:
> at Time 0, one row key (key1) has some data. It is being stored and back up in SSTable file 1.
> at Time 1, if any column for key1 has any change (a new column insert, a column updated/deleted, or even whole row being deleted), I will expect this whole row exists in the any incremental backup SSTable files after time 1, right?
> 
> What happen if the above row just happen to store in more than one SSTable file?
> at Time 0, one row key (key1) has some data, and it just is stored in SSTable file1 and file2, and being backup.
> at Time 1, if one column is added in row key1, and the change in fact will happen in SSTable file2 only in this case, and if we do a incremental backup after that, what SSTable files should I expect in this backup? Both SSTable files? Or Just SSTable file 2?
> 
> I was thinking incremental backup SSTable files are good candidate for catching data being changed, but as one row data could exist in multi SSTable file makes thing complex now. Did anyone have any experience to use SSTable file in this way? What are the lessons?
> 
> Thanks
> 
> Yong
> 
 		 	   		  

Re: questions related to the SSTable file

Posted by "Hiller, Dean" <De...@nrel.gov>.
You have to first understand the rules of

 1.  Sstables are immutable so Color-1-Data.db will not be modified and only deleted once compacted
 2.  Memtables are flushed when reaching a limit so if Blue:{hex} is modified, it is done in the in-memory memtable that is eventually flushed
 3.  Once flushed, it is an SSTable on disk and you have two values for "hex" both with two timestamps so we know which one is the current value

When it finally compacts, the old value can go away.

Dean

From: java8964 java8964 <ja...@hotmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, September 17, 2013 7:32 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: RE: questions related to the SSTable file

Hi, Takenori:

Thanks for your quick reply. Your explain is clear for me understanding what compaction mean, and I also can understand now same row key will exist in multi SSTable file.

But beyond that, I want to know what happen if one row data is too large to put in one SSTable file. In your example, the same row exist in multi SSTable files as it is keeping changing and flushing into the disk at runtime. That's fine, in this case, in every SSTable file of the 4, there is no single file contains whole data of that row, but each one does contain full picture of individual unit ( I don't know what I should call this unit, but it will be larger than one column, right?). Just in your example, there is no way in any time, we could have SSTable files like following, right:

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000}}]
- Color-1-Data_1.db:  [{Blue: {hex:FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]

I don't see any reason Cassandra will ever do that, but just want to confirm, as your 'no' answer to my 2 question is confusion.

Another question from my originally email, even though I may get the answer already from your example, but just want to confirm it.
Just use your example, let's say after the first 2 steps:

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
There is a incremental backup. After that, there is following changes coming:

- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
- Delete a row of (key = Blue)
---- memtable is flushed => Color-3-Data.db ----
Another incremental backup right now.

Now in this case, my assumption is only Color-3-Data.db will be in this backup, right? Even though Color-1-Data.db and Color-2-Data.db contains the data of the same row key as Color-3-Data.db, but from a incremental backup point of view, only Color-3-Data.db will be stored.

The reason I asked those question is that I am thinking to use MapReduce jobs to parse the incremental backup files, and rebuild the snapshot in Hadoop side. Of course, the column families I am doing is pure Fact data. So there is delete/update in Cassandra for these kind of data, just appending. But it is still important for me to understand the SSTable file's content.

Thanks

Yong


________________________________
Date: Tue, 17 Sep 2013 11:12:01 +0900
From: tsato@cloudian.com<ma...@cloudian.com>
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: questions related to the SSTable file

Hi,

> 1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?

Yes.

> 2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?

No.

> 3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
> 4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:

I don't fully understand, but snapshot will do. It will create hard links to all the SSTable files present at snapshot.


Let me explain how SSTable and compaction works.

Suppose we have 4 files being compacted(the last one has bee just flushed, then which triggered compaction). Note that file names are simplified.

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]

They are created by the following operations.

- Add a row of (key, column, column_value = Blue, hex, #0000FF)
- Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
---- memtable is flushed => Color-1-Data.db ----
- Add a row of (key, column, column_value = Green, hex, #008000)
- Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
---- memtable is flushed => Color-2-Data.db ----
- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
- Delete a row of (key = Blue)
---- memtable is flushed => Color-3-Data.db ----
- Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
- Add a row of (key, column, column_value = Gold, hex, #FFD700)
---- memtable is flushed => Color-4-Data.db ----

Then, a compaction will merge all those fragments together into the latest ones as follows.

- Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00FFFF}, {Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
* assuming RandomPartitioner is used

Hope they would help.

- Takenori

(2013/09/17 10:51), java8964 java8964 wrote:
Hi, I have some questions related to the SSTable in the Cassandra, as I am doing a project to use it and hope someone in this list can share some thoughts.

My understand is the SSTable is per column family. But each column family could have multi SSTable files. During the runtime, one row COULD split into more than one SSTable file, even this is not good for performance, but it does happen, and Cassandra will try to merge and store one row data into one SSTable file during compassion.

The question is when one row is split in multi SSTable files, what is the boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I run sstable2json tool to run on both SSTable files individually:

1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?
2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?
3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:

For column family A:
at Time 0, one row key (key1) has some data. It is being stored and back up in SSTable file 1.
at Time 1, if any column for key1 has any change (a new column insert, a column updated/deleted, or even whole row being deleted), I will expect this whole row exists in the any incremental backup SSTable files after time 1, right?

What happen if the above row just happen to store in more than one SSTable file?
at Time 0, one row key (key1) has some data, and it just is stored in SSTable file1 and file2, and being backup.
at Time 1, if one column is added in row key1, and the change in fact will happen in SSTable file2 only in this case, and if we do a incremental backup after that, what SSTable files should I expect in this backup? Both SSTable files? Or Just SSTable file 2?

I was thinking incremental backup SSTable files are good candidate for catching data being changed, but as one row data could exist in multi SSTable file makes thing complex now. Did anyone have any experience to use SSTable file in this way? What are the lessons?

Thanks

Yong


RE: questions related to the SSTable file

Posted by java8964 java8964 <ja...@hotmail.com>.
Hi, Takenori:
Thanks for your quick reply. Your explain is clear for me understanding what compaction mean, and I also can understand now same row key will exist in multi SSTable file.
But beyond that, I want to know what happen if one row data is too large to put in one SSTable file. In your example, the same row exist in multi SSTable files as it is keeping changing and flushing into the disk at runtime. That's fine, in this case, in every SSTable file of the 4, there is no single file contains whole data of that row, but each one does contain full picture of individual unit ( I don't know what I should call this unit, but it will be larger than one column, right?). Just in your example, there is no way in any time, we could have SSTable files like following, right:
- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000}}]- Color-1-Data_1.db:  [{Blue: {hex:FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
I don't see any reason Cassandra will ever do that, but just want to confirm, as your 'no' answer to my 2 question is confusion.
Another question from my originally email, even though I may get the answer already from your example, but just want to confirm it.Just use your example, let's say after the first 2 steps:
- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]There is a incremental backup. After that, there is following changes coming:
- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
- Delete a row of (key = Blue)---- memtable is flushed => Color-3-Data.db ----Another incremental backup right now.
Now in this case, my assumption is only Color-3-Data.db will be in this backup, right? Even though Color-1-Data.db and Color-2-Data.db contains the data of the same row key as Color-3-Data.db, but from a incremental backup point of view, only Color-3-Data.db will be stored.
The reason I asked those question is that I am thinking to use MapReduce jobs to parse the incremental backup files, and rebuild the snapshot in Hadoop side. Of course, the column families I am doing is pure Fact data. So there is delete/update in Cassandra for these kind of data, just appending. But it is still important for me to understand the SSTable file's content.
Thanks
Yong

Date: Tue, 17 Sep 2013 11:12:01 +0900
From: tsato@cloudian.com
To: user@cassandra.apache.org
Subject: Re: questions related to the SSTable file


  
    
  
  
    Hi,

    

    > 1) I will expect same row key could show up in both
    sstable2json output, as this one row exists in both SSTable files,
    right?

    

    Yes.

    

    > 2) If so, what is the boundary? Will Cassandra guarantee the
    column level as the boundary? What I mean is that for one column's
    data, it will be guaranteed to be either in the first file, or 2nd
    file, right? There is no chance that Cassandra will cut the data of
    one column into 2 part, and one part stored in first SSTable file,
    and the other part stored in second SSTable file. Is my
    understanding correct?

    

    No.

    

    > 3) If what we are talking about are only the SSTable files in
    snapshot, incremental backup SSTable files, exclude the runtime
    SSTable files, will anything change? For snapshot or incremental
    backup SSTable files, first can one row data still may exist in more
    than one SSTable file? And any boundary change in this case?
    > 4) If I want to use incremental backup SSTable files as
      the way to catch data being changed, is it a good way to do what I
      try to archive? In this case, what happen in the following
      example:

      

      I don't fully understand, but snapshot will do. It will create
      hard links to all the SSTable files present at snapshot. 

      

      

      Let me explain how SSTable and compaction works.

      

      Suppose we have 4 files being compacted(the last one has bee just
      flushed, then which triggered compaction). Note that file names
      are simplified.

      

      - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex:
      #0000FF}}]

      - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2:
      #2c86ff}}]

      - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2:
      #32CD32}}, {Blue: {}}]

      - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex:
      #FFD700}}]

      

      They are created by the following operations.

      

      - Add a row of (key, column, column_value = Blue, hex, #0000FF)

      - Add a row of (key, column, column_value = Lavender, hex,
      #E6E6FA)

      ---- memtable is flushed => Color-1-Data.db ----

      - Add a row of (key, column, column_value = Green, hex, #008000)

      - Add a column of (key, column, column_value = Blue, hex2,
      #2c86ff)

      ---- memtable is flushed => Color-2-Data.db ----

      - Add a column of (key, column, column_value = Green, hex2,
      #32CD32)

      - Add a row of (key, column, column_value = Aqua, hex, #00FFFF)

      - Delete a row of (key = Blue)

      ---- memtable is flushed => Color-3-Data.db ----

      - Add a row of (key, column, column_value = Magenta, hex, #FF00FF)

      - Add a row of (key, column, column_value = Gold, hex, #FFD700)

      ---- memtable is flushed => Color-4-Data.db ----

      

      Then, a compaction will merge all those fragments together into
      the latest ones as follows.

      

      - Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex:
      #00FFFF}, {Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex:
      #FF00FF}}, {Gold: {hex: #FFD700}}]

      * assuming RandomPartitioner is used

      

      Hope they would help.

      

      - Takenori

    
    

    (2013/09/17 10:51), java8964 java8964
      wrote:

    
    
      
      Hi, I have some questions related to the SSTable in
        the Cassandra, as I am doing a project to use it and hope
        someone in this list can share some thoughts.
        

        
        My understand is the SSTable is per column family. But each
          column family could have multi SSTable files. During the
          runtime, one row COULD split into more than one SSTable file,
          even this is not good for performance, but it does happen, and
          Cassandra will try to merge and store one row data into one
          SSTable file during compassion.
        

        
        The question is when one row is split in multi SSTable
          files, what is the boundary? Or let me ask this way, if one
          row exists in 2 SSTable files, if I run sstable2json tool to
          run on both SSTable files individually:
        

        
        1) I will expect same row key could show up in both
          sstable2json output, as this one row exists in both SSTable
          files, right?
        2) If so, what is the boundary? Will Cassandra guarantee
          the column level as the boundary? What I mean is that for one
          column's data, it will be guaranteed to be either in the first
          file, or 2nd file, right? There is no chance that Cassandra
          will cut the data of one column into 2 part, and one part
          stored in first SSTable file, and the other part stored in
          second SSTable file. Is my understanding correct?
        3) If what we are talking about are only the SSTable files
          in snapshot, incremental backup SSTable files, exclude the
          runtime SSTable files, will anything change? For snapshot or
          incremental backup SSTable files, first can one row data still
          may exist in more than one SSTable file? And any boundary
          change in this case?
        4) If I want to use incremental backup SSTable files as the
          way to catch data being changed, is it a good way to do what I
          try to archive? In this case, what happen in the following
          example:
        

        
        For column family A:
        at Time 0, one row key (key1) has some data. It is being
          stored and back up in SSTable file 1.
        at Time 1, if any column for key1 has any change (a new
          column insert, a column updated/deleted, or even whole row
          being deleted), I will expect this whole row exists in the any
          incremental backup SSTable files after time 1, right?
        

        
        What happen if the above row just happen to store in more
          than one SSTable file?
        at Time 0, one row key (key1) has some data, and it just is
          stored in SSTable file1 and file2, and being backup.
        at Time 1, if one column is added in row key1, and the
          change in fact will happen in SSTable file2 only in this case,
          and if we do a incremental backup after that, what SSTable
          files should I expect in this backup? Both SSTable files? Or
          Just SSTable file 2?
        

        
        I was thinking incremental backup SSTable files are good
          candidate for catching data being changed, but as one row data
          could exist in multi SSTable file makes thing complex now. Did
          anyone have any experience to use SSTable file in this way?
          What are the lessons?
        

        
        Thanks
        

        
        Yong
      
    
    
 		 	   		  

RE: questions related to the SSTable file

Posted by java8964 java8964 <ja...@hotmail.com>.
Hi, Dean:
Can you explain a little more about what do you mean?
If I change example a little bit:
Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
Now if we add a new Green column, and update the blue column, but the data flushed to another SSTable file:
Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex: #2c86ff}}]
So you mean at this time, I could get 2 SSTable files, both contain column "Blue" for the same row key, right? In this case, I should be fine as value of the "Blue" column contain the timestamp to help me to find out which is the last change, right? In MR world, each file COULD be processed by different Mapper, but will be sent to the same reducer as both data will be shared same key.
Yong

> From: Dean.Hiller@nrel.gov
> To: user@cassandra.apache.org
> Date: Tue, 17 Sep 2013 06:32:03 -0600
> Subject: Re: questions related to the SSTable file
> 
> You may want to be careful as column 1 could be stored in both files until compaction as well when column 1 has encountered changes and cassandra returns the latest column 1 version but two sstables contain column 1.  (At least that is the way I understand it).
> 
> Later,
> Dean
> 
> From: "Takenori Sato (Cloudian)" <ts...@cloudian.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Monday, September 16, 2013 8:12 PM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Re: questions related to the SSTable file
> 
> Hi,
> 
> > 1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?
> 
> Yes.
> 
> > 2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?
> 
> No.
> 
> > 3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
> > 4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:
> 
> I don't fully understand, but snapshot will do. It will create hard links to all the SSTable files present at snapshot.
> 
> 
> Let me explain how SSTable and compaction works.
> 
> Suppose we have 4 files being compacted(the last one has bee just flushed, then which triggered compaction). Note that file names are simplified.
> 
> - Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
> - Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
> - Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
> - Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> 
> They are created by the following operations.
> 
> - Add a row of (key, column, column_value = Blue, hex, #0000FF)
> - Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
> ---- memtable is flushed => Color-1-Data.db ----
> - Add a row of (key, column, column_value = Green, hex, #008000)
> - Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
> ---- memtable is flushed => Color-2-Data.db ----
> - Add a column of (key, column, column_value = Green, hex2, #32CD32)
> - Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
> - Delete a row of (key = Blue)
> ---- memtable is flushed => Color-3-Data.db ----
> - Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
> - Add a row of (key, column, column_value = Gold, hex, #FFD700)
> ---- memtable is flushed => Color-4-Data.db ----
> 
> Then, a compaction will merge all those fragments together into the latest ones as follows.
> 
> - Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00FFFF}, {Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
> * assuming RandomPartitioner is used
> 
> Hope they would help.
> 
> - Takenori
> 
> (2013/09/17 10:51), java8964 java8964 wrote:
> Hi, I have some questions related to the SSTable in the Cassandra, as I am doing a project to use it and hope someone in this list can share some thoughts.
> 
> My understand is the SSTable is per column family. But each column family could have multi SSTable files. During the runtime, one row COULD split into more than one SSTable file, even this is not good for performance, but it does happen, and Cassandra will try to merge and store one row data into one SSTable file during compassion.
> 
> The question is when one row is split in multi SSTable files, what is the boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I run sstable2json tool to run on both SSTable files individually:
> 
> 1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?
> 2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?
> 3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
> 4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:
> 
> For column family A:
> at Time 0, one row key (key1) has some data. It is being stored and back up in SSTable file 1.
> at Time 1, if any column for key1 has any change (a new column insert, a column updated/deleted, or even whole row being deleted), I will expect this whole row exists in the any incremental backup SSTable files after time 1, right?
> 
> What happen if the above row just happen to store in more than one SSTable file?
> at Time 0, one row key (key1) has some data, and it just is stored in SSTable file1 and file2, and being backup.
> at Time 1, if one column is added in row key1, and the change in fact will happen in SSTable file2 only in this case, and if we do a incremental backup after that, what SSTable files should I expect in this backup? Both SSTable files? Or Just SSTable file 2?
> 
> I was thinking incremental backup SSTable files are good candidate for catching data being changed, but as one row data could exist in multi SSTable file makes thing complex now. Did anyone have any experience to use SSTable file in this way? What are the lessons?
> 
> Thanks
> 
> Yong
> 
 		 	   		  

Re: questions related to the SSTable file

Posted by "Hiller, Dean" <De...@nrel.gov>.
You may want to be careful as column 1 could be stored in both files until compaction as well when column 1 has encountered changes and cassandra returns the latest column 1 version but two sstables contain column 1.  (At least that is the way I understand it).

Later,
Dean

From: "Takenori Sato (Cloudian)" <ts...@cloudian.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday, September 16, 2013 8:12 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: questions related to the SSTable file

Hi,

> 1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?

Yes.

> 2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?

No.

> 3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
> 4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:

I don't fully understand, but snapshot will do. It will create hard links to all the SSTable files present at snapshot.


Let me explain how SSTable and compaction works.

Suppose we have 4 files being compacted(the last one has bee just flushed, then which triggered compaction). Note that file names are simplified.

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, {Blue: {}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]

They are created by the following operations.

- Add a row of (key, column, column_value = Blue, hex, #0000FF)
- Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
---- memtable is flushed => Color-1-Data.db ----
- Add a row of (key, column, column_value = Green, hex, #008000)
- Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
---- memtable is flushed => Color-2-Data.db ----
- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
- Delete a row of (key = Blue)
---- memtable is flushed => Color-3-Data.db ----
- Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
- Add a row of (key, column, column_value = Gold, hex, #FFD700)
---- memtable is flushed => Color-4-Data.db ----

Then, a compaction will merge all those fragments together into the latest ones as follows.

- Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00FFFF}, {Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]
* assuming RandomPartitioner is used

Hope they would help.

- Takenori

(2013/09/17 10:51), java8964 java8964 wrote:
Hi, I have some questions related to the SSTable in the Cassandra, as I am doing a project to use it and hope someone in this list can share some thoughts.

My understand is the SSTable is per column family. But each column family could have multi SSTable files. During the runtime, one row COULD split into more than one SSTable file, even this is not good for performance, but it does happen, and Cassandra will try to merge and store one row data into one SSTable file during compassion.

The question is when one row is split in multi SSTable files, what is the boundary? Or let me ask this way, if one row exists in 2 SSTable files, if I run sstable2json tool to run on both SSTable files individually:

1) I will expect same row key could show up in both sstable2json output, as this one row exists in both SSTable files, right?
2) If so, what is the boundary? Will Cassandra guarantee the column level as the boundary? What I mean is that for one column's data, it will be guaranteed to be either in the first file, or 2nd file, right? There is no chance that Cassandra will cut the data of one column into 2 part, and one part stored in first SSTable file, and the other part stored in second SSTable file. Is my understanding correct?
3) If what we are talking about are only the SSTable files in snapshot, incremental backup SSTable files, exclude the runtime SSTable files, will anything change? For snapshot or incremental backup SSTable files, first can one row data still may exist in more than one SSTable file? And any boundary change in this case?
4) If I want to use incremental backup SSTable files as the way to catch data being changed, is it a good way to do what I try to archive? In this case, what happen in the following example:

For column family A:
at Time 0, one row key (key1) has some data. It is being stored and back up in SSTable file 1.
at Time 1, if any column for key1 has any change (a new column insert, a column updated/deleted, or even whole row being deleted), I will expect this whole row exists in the any incremental backup SSTable files after time 1, right?

What happen if the above row just happen to store in more than one SSTable file?
at Time 0, one row key (key1) has some data, and it just is stored in SSTable file1 and file2, and being backup.
at Time 1, if one column is added in row key1, and the change in fact will happen in SSTable file2 only in this case, and if we do a incremental backup after that, what SSTable files should I expect in this backup? Both SSTable files? Or Just SSTable file 2?

I was thinking incremental backup SSTable files are good candidate for catching data being changed, but as one row data could exist in multi SSTable file makes thing complex now. Did anyone have any experience to use SSTable file in this way? What are the lessons?

Thanks

Yong


Re: questions related to the SSTable file

Posted by "Takenori Sato(Cloudian)" <ts...@cloudian.com>.
Hi,

 > 1) I will expect same row key could show up in both sstable2json 
output, as this one row exists in both SSTable files, right?

Yes.

 > 2) If so, what is the boundary? Will Cassandra guarantee the column 
level as the boundary? What I mean is that for one column's data, it 
will be guaranteed to be either in the first file, or 2nd file, right? 
There is no chance that Cassandra will cut the data of one column into 2 
part, and one part stored in first SSTable file, and the other part 
stored in second SSTable file. Is my understanding correct?

No.

 > 3) If what we are talking about are only the SSTable files in 
snapshot, incremental backup SSTable files, exclude the runtime SSTable 
files, will anything change? For snapshot or incremental backup SSTable 
files, first can one row data still may exist in more than one SSTable 
file? And any boundary change in this case?
 > 4) If I want to use incremental backup SSTable files as the way to 
catch data being changed, is it a good way to do what I try to archive? 
In this case, what happen in the following example:

I don't fully understand, but snapshot will do. It will create hard 
links to all the SSTable files present at snapshot.


Let me explain how SSTable and compaction works.

Suppose we have 4 files being compacted(the last one has bee just 
flushed, then which triggered compaction). Note that file names are 
simplified.

- Color-1-Data.db: [{Lavender: {hex: #E6E6FA}}, {Blue: {hex: #0000FF}}]
- Color-2-Data.db: [{Green: {hex: #008000}}, {Blue: {hex2: #2c86ff}}]
- Color-3-Data.db: [{Aqua: {hex: #00FFFF}}, {Green: {hex2: #32CD32}}, 
{Blue: {}}]
- Color-4-Data.db: [{Magenta: {hex: #FF00FF}}, {Gold: {hex: #FFD700}}]

They are created by the following operations.

- Add a row of (key, column, column_value = Blue, hex, #0000FF)
- Add a row of (key, column, column_value = Lavender, hex, #E6E6FA)
---- memtable is flushed => Color-1-Data.db ----
- Add a row of (key, column, column_value = Green, hex, #008000)
- Add a column of (key, column, column_value = Blue, hex2, #2c86ff)
---- memtable is flushed => Color-2-Data.db ----
- Add a column of (key, column, column_value = Green, hex2, #32CD32)
- Add a row of (key, column, column_value = Aqua, hex, #00FFFF)
- Delete a row of (key = Blue)
---- memtable is flushed => Color-3-Data.db ----
- Add a row of (key, column, column_value = Magenta, hex, #FF00FF)
- Add a row of (key, column, column_value = Gold, hex, #FFD700)
---- memtable is flushed => Color-4-Data.db ----

Then, a compaction will merge all those fragments together into the 
latest ones as follows.

- Color-5-Data.db: [{Lavender: {hex: #E6E6FA}, {Aqua: {hex: #00FFFF}, 
{Green: {hex: #008000, hex2: #32CD32}}, {Magenta: {hex: #FF00FF}}, 
{Gold: {hex: #FFD700}}]
* assuming RandomPartitioner is used

Hope they would help.

- Takenori

(2013/09/17 10:51), java8964 java8964 wrote:
> Hi, I have some questions related to the SSTable in the Cassandra, as 
> I am doing a project to use it and hope someone in this list can share 
> some thoughts.
>
> My understand is the SSTable is per column family. But each column 
> family could have multi SSTable files. During the runtime, one row 
> COULD split into more than one SSTable file, even this is not good for 
> performance, but it does happen, and Cassandra will try to merge and 
> store one row data into one SSTable file during compassion.
>
> The question is when one row is split in multi SSTable files, what is 
> the boundary? Or let me ask this way, if one row exists in 2 SSTable 
> files, if I run sstable2json tool to run on both SSTable files 
> individually:
>
> 1) I will expect same row key could show up in both sstable2json 
> output, as this one row exists in both SSTable files, right?
> 2) If so, what is the boundary? Will Cassandra guarantee the column 
> level as the boundary? What I mean is that for one column's data, it 
> will be guaranteed to be either in the first file, or 2nd file, right? 
> There is no chance that Cassandra will cut the data of one column into 
> 2 part, and one part stored in first SSTable file, and the other part 
> stored in second SSTable file. Is my understanding correct?
> 3) If what we are talking about are only the SSTable files in 
> snapshot, incremental backup SSTable files, exclude the runtime 
> SSTable files, will anything change? For snapshot or incremental 
> backup SSTable files, first can one row data still may exist in more 
> than one SSTable file? And any boundary change in this case?
> 4) If I want to use incremental backup SSTable files as the way to 
> catch data being changed, is it a good way to do what I try to 
> archive? In this case, what happen in the following example:
>
> For column family A:
> at Time 0, one row key (key1) has some data. It is being stored and 
> back up in SSTable file 1.
> at Time 1, if any column for key1 has any change (a new column insert, 
> a column updated/deleted, or even whole row being deleted), I will 
> expect this whole row exists in the any incremental backup SSTable 
> files after time 1, right?
>
> What happen if the above row just happen to store in more than one 
> SSTable file?
> at Time 0, one row key (key1) has some data, and it just is stored in 
> SSTable file1 and file2, and being backup.
> at Time 1, if one column is added in row key1, and the change in fact 
> will happen in SSTable file2 only in this case, and if we do a 
> incremental backup after that, what SSTable files should I expect in 
> this backup? Both SSTable files? Or Just SSTable file 2?
>
> I was thinking incremental backup SSTable files are good candidate for 
> catching data being changed, but as one row data could exist in multi 
> SSTable file makes thing complex now. Did anyone have any experience 
> to use SSTable file in this way? What are the lessons?
>
> Thanks
>
> Yong