You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Antonis Papaioannou <pa...@ics.forth.gr> on 2016/06/15 13:53:17 UTC

SSTable index format

Hi,

I'm interested in the SSTable index file format and particularly in 
Cassandra 2.2 which uses the SSTable version "ma".
Apart from keys and their corresponding offsets in the data file what 
else is included in each index entry?

I'm trying to trace code when an SSTable is flushed (especially in class 
BigTableWriter.java).
I see that each RowIndexEntry may contain a ColumnIndex which in turn it 
has a list with IndexHelper.IndexInfo entries.
So i would expect the index format to be something like this:
<key><fields_list><offset_in_datafile>

On the other hand it seems that the ColumnIndex does not contain all the 
columns of the data row.

Let me give you an example.
Assume the following schema of a column family
mytable ( y_id varchar primary key, field0 varchar, field1 varchar, 
field2 varchar);

In this case if i execute the queries below:
INSERT INTO ycsb.usertable (y_id, field0, field1, field2) VALUES ('k1', 
'f1a', 'f1b', 'f1c');
INSERT INTO ycsb.usertable (y_id, field0) VALUES ('k2', 'f2a');

and then flush the table, I would expect the index to have the following 
info:
k1, [field0, field1, field2], <offset>
k2, [field0], <offset>

Is this correct?
Is there a documentation page with the file format of the index file?

Re: SSTable index format

Posted by Kaide Mu <ka...@gmail.com>.

C* 2.2 SSTable format is "la", "ma" is introduced in 3.0 including big
changes on storage engige.

Assuming you are asking about 2.2 and you are aware of SSTable is compound
by different components. Index file which is Index.db just maps row keys to
the position in Data.db. Now about how is Index.db is structured you may
want to check source code of RowIndexEntry specially how it is serialized,
also you probably may want to check ColumnIndex.Builder.build,
IndexHelper.IndexInfo.Serializer.serialize.  For a complete flushing
process I recommend you check carefully the source code of BigTableWriter
as you already did, for Index.db you probably have to check IndexWriter
section.

> On the other hand it seems that the ColumnIndex does not contain all the
columns of the data row.

Maybe someone can confirm this, but I guess your assumption is correct, the
idea is that the core abstraction which are we working in 2.2 are cells
instead of rows which is introduced in 3.0.

On Wed, Jun 15, 2016 at 3:53 PM Antonis Papaioannou <pa...@ics.forth.gr>
wrote:

> Hi,
>
> I'm interested in the SSTable index file format and particularly in
> Cassandra 2.2 which uses the SSTable version "ma".
> Apart from keys and their corresponding offsets in the data file what
> else is included in each index entry?
>
> I'm trying to trace code when an SSTable is flushed (especially in class
> BigTableWriter.java).
> I see that each RowIndexEntry may contain a ColumnIndex which in turn it
> has a list with IndexHelper.IndexInfo entries.
> So i would expect the index format to be something like this:
> <key><fields_list><offset_in_datafile>
>
> On the other hand it seems that the ColumnIndex does not contain all the
> columns of the data row.
>
> Let me give you an example.
> Assume the following schema of a column family
> mytable ( y_id varchar primary key, field0 varchar, field1 varchar,
> field2 varchar);
>
> In this case if i execute the queries below:
> INSERT INTO ycsb.usertable (y_id, field0, field1, field2) VALUES ('k1',
> 'f1a', 'f1b', 'f1c');
> INSERT INTO ycsb.usertable (y_id, field0) VALUES ('k2', 'f2a');
>
> and then flush the table, I would expect the index to have the following
> info:
> k1, [field0, field1, field2], <offset>
> k2, [field0], <offset>
>
> Is this correct?
> Is there a documentation page with the file format of the index file?
>
>