You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Maciej Miklas <ma...@gmail.com> on 2012/08/16 09:34:52 UTC

SSTable Index and Metadata - are they cached in RAM?

Hi all,

bloom filter for row keys is always in RAM. What about SSTable index, and
Metadata?

Is it cached by Cassandra, or it relays on memory mapped files?


Thanks,
Maciej

Re: SSTable Index and Metadata - are they cached in RAM?

Posted by aaron morton <aa...@thelastpickle.com>.
> 2) Rad from disk all row keys, in order to find one (binary search) 
No.
At startup cassandra samples the -index.db component every index_interval keys. At worst index_interval keys must be read from disk. 

> As I understand, in the worst case, we can have three disk seeks (2, 4, 6) pro SSTable in order to check whenever it contains given column, it that correct ?
It depends on the size of the row. For a small (less than column_index_size_in_kb) size row it's to get a specific column it's :
* 1 seek in index.db
* 1 seek in data.db 

> I would expect, that sorted row keys (from point 2) ) already contain bloom filter for their columns. But bloom filter is stored together with column index, is that correct?
Yes

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/08/2012, at 7:31 PM, Maciej Miklas <ma...@gmail.com> wrote:

> Great articles, I did not find those before !
> 
> SSTable Index - yes I mean column Index.
> 
> I would like to understand, how many disk seeks might be required to find column in single SSTable.
> 
> I am assuming positive bloom filter on row key. Now Cassandra needs to find out whenever given SSTable contains column name, and this might require few disk seeks:
> 1) Check key cache, if found go to 5)
> 2) Rad from disk all row keys, in order to find one (binary search) 
> 3) Found row key contains disk offset to its column index
> 4) Read from disk column index for our row key. Index contains also bloom filter on column names
> 5) Use bloom filter on column name, to find out whenever this SSTable might contain our column
> 6) Read column to finally make sure that is exists
> 
> As I understand, in the worst case, we can have three disk seeks (2, 4, 6) pro SSTable in order to check whenever it contains given column, it that correct ?
> 
> I would expect, that sorted row keys (from point 2) ) already contain bloom filter for their columns. But bloom filter is stored together with column index, is that correct?
> 
> 
> Cheers,
> Maciej
> 
> On Fri, Aug 17, 2012 at 12:06 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> What about SSTable index, 
> Not sure what you are referring to there. Each row has a in a SStable has a bloom filter and may have an index of columns. This is not cached. 
> 
> See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
> 
>>  and Metadata?
> 
> This is the meta data we hold in memory for every open sstable
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java
> 
> Cheers
>   
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/08/2012, at 7:34 PM, Maciej Miklas <ma...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> bloom filter for row keys is always in RAM. What about SSTable index, and Metadata?
>> 
>> Is it cached by Cassandra, or it relays on memory mapped files?
>> 
>> 
>> Thanks,
>> Maciej
> 
> 


Re: SSTable Index and Metadata - are they cached in RAM?

Posted by Maciej Miklas <ma...@gmail.com>.
Great articles, I did not find those before !
*
SSTable Index - yes I mean column Index.

*I would like to understand, how many disk seeks might be required to find
column in single SSTable.

I am assuming positive bloom filter on row key. Now Cassandra needs to find
out whenever given SSTable contains column name, and this might require few
disk seeks:
1) Check key cache, if found go to 5)
2) Rad from disk all row keys, in order to find one (binary search)
3) Found row key contains disk offset to its column index
4) Read from disk column index for our row key. Index contains also bloom
filter on column names
5) Use bloom filter on column name, to find out whenever this SSTable might
contain our column
6) Read column to finally make sure that is exists

As I understand, in the worst case, we can have three disk seeks (2, 4, 6)
pro SSTable in order to check whenever it contains given column, it that
correct ?

I would expect, that sorted row keys (from point 2) ) already contain bloom
filter for their columns. But bloom filter is stored together with column
index, is that correct?


Cheers,
Maciej

On Fri, Aug 17, 2012 at 12:06 AM, aaron morton <aa...@thelastpickle.com>wrote:

> What about SSTable index,
>
> Not sure what you are referring to there. Each row has a in a SStable has
> a bloom filter and may have an index of columns. This is not cached.
>
> See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or
> http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance
>
>  and Metadata?
>
> This is the meta data we hold in memory for every open sstable
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/08/2012, at 7:34 PM, Maciej Miklas <ma...@gmail.com> wrote:
>
> Hi all,
>
> bloom filter for row keys is always in RAM. What about SSTable index, and
> Metadata?
>
> Is it cached by Cassandra, or it relays on memory mapped files?
>
>
> Thanks,
> Maciej
>
>
>

Re: SSTable Index and Metadata - are they cached in RAM?

Posted by aaron morton <aa...@thelastpickle.com>.
> What about SSTable index, 
Not sure what you are referring to there. Each row has a in a SStable has a bloom filter and may have an index of columns. This is not cached. 

See http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ or http://www.slideshare.net/aaronmorton/cassandra-sf-2012-technical-deep-dive-query-performance

>  and Metadata?

This is the meta data we hold in memory for every open sstable
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java

Cheers
  

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/08/2012, at 7:34 PM, Maciej Miklas <ma...@gmail.com> wrote:

> Hi all,
> 
> bloom filter for row keys is always in RAM. What about SSTable index, and Metadata?
> 
> Is it cached by Cassandra, or it relays on memory mapped files?
> 
> 
> Thanks,
> Maciej