You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Nikolay Mihaylov <nm...@nmmm.nu> on 2013/08/07 09:57:09 UTC

cassandra disk access

Hi

I am researching various hash-tables and b-trees on disk.

while I researched, I has a thoughts about cassandra sstables that I want
to verify it here.

1. cassandra sstable uses sequential disk I/O when created. e.g. disk head
write it from the beginning to the end. Assuming the disk is not
fragmented, the sstable is placed on disk sectors one after the other.

2. when cassandra lookups a key in sstable (assuming bloom-filter and other
"stuff" failed, also assuming the key is located in this single sstable),
cassandra DO NOT USE sequential I/O. "She" probably will read the
hash-table slot or similar structure, then cassandra will do another disk
seek in order to get the value (and probably the key). Also probably there
will need another seek, if there is key collision there will need
additional seeks.

3. once the data (e.g. the row) is located, a sequential read for entire
row will occur. (Once again I assume there is single well compacted
sstable). Also if disk is not fragmented, the data will be placed on disk
sectors one after the other.

Am I wrong?

Nick.

Re: cassandra disk access

Posted by Aaron Morton <aa...@thelastpickle.com>.

Some background on the read and write paths, some of the extra details are a little out of date but mostly correct in 1.2

http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118/40
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/

Cheers

-----------------
Aaron Morton
Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/08/2013, at 9:07 PM, Michał Michalski <mi...@opera.com> wrote:

> I'm not sure how accurate it is (it's from 2011, one of its sources is from 2010), but I'm pretty sure it's more or less OK:
> 
> http://blog.csdn.net/firecoder/article/details/7019435
> 
> M.
> 
> W dniu 07.08.2013 10:34, Nikolay Mihaylov pisze:
>> thanks
>> 
>> It will use the Index Sample (RAM) first, then it will use "full" Index
>> (disk) and finally it will read data from SSTable (disk). There's no such
>> thing like "collision" in this case.
>> 
>> so it still have 2 seeks :)
>> 
>> where I can see the internal structure of the sstable i tried to find it
>> documented but was unable to find anything ?
>> 
>> 
>> 
>> 
>> On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski <mi...@opera.com> wrote:
>> 
>>> 
>>>  2. when cassandra lookups a key in sstable (assuming bloom-filter and
>>>> other
>>>> "stuff" failed, also assuming the key is located in this single sstable),
>>>> cassandra DO NOT USE sequential I/O. "She" probably will read the
>>>> hash-table slot or similar structure, then cassandra will do another disk
>>>> seek in order to get the value (and probably the key). Also probably there
>>>> will need another seek, if there is key collision there will need
>>>> additional seeks.
>>>> 
>>> 
>>> It will use the Index Sample (RAM) first, then it will use "full" Index
>>> (disk) and finally it will read data from SSTable (disk). There's no such
>>> thing like "collision" in this case.
>>> 
>>> 
>>>  3. once the data (e.g. the row) is located, a sequential read for entire
>>>> row will occur. (Once again I assume there is single well compacted
>>>> sstable). Also if disk is not fragmented, the data will be placed on disk
>>>> sectors one after the other.
>>>> 
>>> 
>>> Yes, this is how I understand it too.
>>> 
>>> M.
>>> 
>>> 
>> 
>

Re: cassandra disk access

Posted by Michał Michalski <mi...@opera.com>.

I'm not sure how accurate it is (it's from 2011, one of its sources is 
from 2010), but I'm pretty sure it's more or less OK:

http://blog.csdn.net/firecoder/article/details/7019435

M.

W dniu 07.08.2013 10:34, Nikolay Mihaylov pisze:
> thanks
>
> It will use the Index Sample (RAM) first, then it will use "full" Index
> (disk) and finally it will read data from SSTable (disk). There's no such
> thing like "collision" in this case.
>
> so it still have 2 seeks :)
>
> where I can see the internal structure of the sstable i tried to find it
> documented but was unable to find anything ?
>
>
>
>
> On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski <mi...@opera.com> wrote:
>
>>
>>   2. when cassandra lookups a key in sstable (assuming bloom-filter and
>>> other
>>> "stuff" failed, also assuming the key is located in this single sstable),
>>> cassandra DO NOT USE sequential I/O. "She" probably will read the
>>> hash-table slot or similar structure, then cassandra will do another disk
>>> seek in order to get the value (and probably the key). Also probably there
>>> will need another seek, if there is key collision there will need
>>> additional seeks.
>>>
>>
>> It will use the Index Sample (RAM) first, then it will use "full" Index
>> (disk) and finally it will read data from SSTable (disk). There's no such
>> thing like "collision" in this case.
>>
>>
>>   3. once the data (e.g. the row) is located, a sequential read for entire
>>> row will occur. (Once again I assume there is single well compacted
>>> sstable). Also if disk is not fragmented, the data will be placed on disk
>>> sectors one after the other.
>>>
>>
>> Yes, this is how I understand it too.
>>
>> M.
>>
>>
>

Re: cassandra disk access

Posted by Nikolay Mihaylov <nm...@nmmm.nu>.

thanks

It will use the Index Sample (RAM) first, then it will use "full" Index
(disk) and finally it will read data from SSTable (disk). There's no such
thing like "collision" in this case.

so it still have 2 seeks :)

where I can see the internal structure of the sstable i tried to find it
documented but was unable to find anything ?




On Wed, Aug 7, 2013 at 11:27 AM, Michał Michalski <mi...@opera.com> wrote:

>
>  2. when cassandra lookups a key in sstable (assuming bloom-filter and
>> other
>> "stuff" failed, also assuming the key is located in this single sstable),
>> cassandra DO NOT USE sequential I/O. "She" probably will read the
>> hash-table slot or similar structure, then cassandra will do another disk
>> seek in order to get the value (and probably the key). Also probably there
>> will need another seek, if there is key collision there will need
>> additional seeks.
>>
>
> It will use the Index Sample (RAM) first, then it will use "full" Index
> (disk) and finally it will read data from SSTable (disk). There's no such
> thing like "collision" in this case.
>
>
>  3. once the data (e.g. the row) is located, a sequential read for entire
>> row will occur. (Once again I assume there is single well compacted
>> sstable). Also if disk is not fragmented, the data will be placed on disk
>> sectors one after the other.
>>
>
> Yes, this is how I understand it too.
>
> M.
>
>

Re: cassandra disk access

Posted by Michał Michalski <mi...@opera.com>.

> 2. when cassandra lookups a key in sstable (assuming bloom-filter and other
> "stuff" failed, also assuming the key is located in this single sstable),
> cassandra DO NOT USE sequential I/O. "She" probably will read the
> hash-table slot or similar structure, then cassandra will do another disk
> seek in order to get the value (and probably the key). Also probably there
> will need another seek, if there is key collision there will need
> additional seeks.

It will use the Index Sample (RAM) first, then it will use "full" Index 
(disk) and finally it will read data from SSTable (disk). There's no 
such thing like "collision" in this case.

> 3. once the data (e.g. the row) is located, a sequential read for entire
> row will occur. (Once again I assume there is single well compacted
> sstable). Also if disk is not fragmented, the data will be placed on disk
> sectors one after the other.

Yes, this is how I understand it too.

M.