You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kanwar Sangha <ka...@mavenir.com> on 2013/02/21 06:52:09 UTC

Read IO

Hi - Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512.


1)      Bloom filter will be checked to see existence of key (In RAM)

2)      Index filer sample (IN RAM) will be checked to find approx. location in index file on disk

3)      1 IOPS to read the actual index file on disk (DISK)

4)      1 IOPS to get the data from the location in the sstable (DISK)

Is this correct ?



RE: Read IO

Posted by Kanwar Sangha <ka...@mavenir.com>.
Ok.. Cassandra default block size is 256k ? Now say my data in the column is 4 MB.  And the disk is giving me 4k block size random reads @ 100 IOPS. I can read max 400k in one seek ? does that mean I would need multiple seeks to get the complete data ?


-----Original Message-----
From: scode@scode.org [mailto:scode@scode.org] On Behalf Of Peter Schuller
Sent: 21 February 2013 00:05
To: user@cassandra.apache.org
Subject: Re: Read IO

> Is this correct ?

Yes, at least under optimal conditions and assuming a reasonably sized row. Things like read-ahead (at the kernel level) will play into it; and if your read (even if assumed to be small) straddles two pages you might or might not take another read depending on your kernel settings (typically trading pollution of page cache vs. number of I/O:s).

--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Read IO

Posted by Peter Schuller <pe...@infidyne.com>.
> Is this correct ?

Yes, at least under optimal conditions and assuming a reasonably sized
row. Things like read-ahead (at the kernel level) will play into it;
and if your read (even if assumed to be small) straddles two pages you
might or might not take another read depending on your kernel settings
(typically trading pollution of page cache vs. number of I/O:s).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Read IO

Posted by aaron morton <aa...@thelastpickle.com>.
AFAIk this is still roughly correct http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/

It includes information on the page size read from disk. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/02/2013, at 5:45 AM, Jouni Hartikainen <jo...@reaktor.fi> wrote:

> 
> Hi,
> 
> On Feb 21, 2013, at 7:52 , Kanwar Sangha <ka...@mavenir.com> wrote:
>> Hi – Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512.
>> 
>> 1)      Bloom filter will be checked to see existence of key (In RAM)
>> 2)      Index filer sample (IN RAM) will be checked to find approx. location in index file on disk
>> 3)      1 IOPS to read the actual index file on disk (DISK)
>> 4)      1 IOPS to get the data from the location in the sstable (DISK)
>> 
>> Is this correct ?
> 
> As you were asking for the worst case, I would still add one step that would be a seek inside an SSTable from the row start to the queried columns using column index.
> 
> However, this applies only if you are querying a subset of columns in the row (not all) and the total row size exceeds column_index_size_in_kb (defaults to 64kB).
> 
> So, as far as I have understood, the worst case steps (without any caches) are:
> 
> 1. Check the SSTable bloom filters (in memory)
> 2. Use index samples to find approx. correct place in the key index file (in memory)
> 3. Read the key index file until correct key is found (1st disk seek & read)
> 5. Seek to the start of the row in SSTable file and read row headers (possibly including column index) (2nd seek & read)
> 6. Using column index seek to the correct place inside the SSTable file to actually read the columns (3rd seek & read)
> 
> If the row is very wide and you are asking for a random bunch of columns from here and there, the step 6 might even be needed multiple times. Also, if your row has spread over many SSTables, each of them needs to be accessed (at least steps 1. - 5.) to get the complete results for the query.
> 
> All this in mind, if your node has any reasonable amount of reads, I'd say that in practice key index files will be page cached by the OS very quickly and thus normal read would end up being either one seek (for small rows without the column index) or two (for wider rows). Of course, as Peter already pointed out, the more columns you ask for, the more disk needs to read. For a continuous set of columns the read should be linear, however.
> 
> -Jouni


Re: Read IO

Posted by Jouni Hartikainen <jo...@reaktor.fi>.
Hi,

On Feb 21, 2013, at 7:52 , Kanwar Sangha <ka...@mavenir.com> wrote:
> Hi – Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512.
>  
> 1)      Bloom filter will be checked to see existence of key (In RAM)
> 2)      Index filer sample (IN RAM) will be checked to find approx. location in index file on disk
> 3)      1 IOPS to read the actual index file on disk (DISK)
> 4)      1 IOPS to get the data from the location in the sstable (DISK)
>  
> Is this correct ?

As you were asking for the worst case, I would still add one step that would be a seek inside an SSTable from the row start to the queried columns using column index.

However, this applies only if you are querying a subset of columns in the row (not all) and the total row size exceeds column_index_size_in_kb (defaults to 64kB).

So, as far as I have understood, the worst case steps (without any caches) are:

1. Check the SSTable bloom filters (in memory)
2. Use index samples to find approx. correct place in the key index file (in memory)
3. Read the key index file until correct key is found (1st disk seek & read)
5. Seek to the start of the row in SSTable file and read row headers (possibly including column index) (2nd seek & read)
6. Using column index seek to the correct place inside the SSTable file to actually read the columns (3rd seek & read)

If the row is very wide and you are asking for a random bunch of columns from here and there, the step 6 might even be needed multiple times. Also, if your row has spread over many SSTables, each of them needs to be accessed (at least steps 1. - 5.) to get the complete results for the query.

All this in mind, if your node has any reasonable amount of reads, I'd say that in practice key index files will be page cached by the OS very quickly and thus normal read would end up being either one seek (for small rows without the column index) or two (for wider rows). Of course, as Peter already pointed out, the more columns you ask for, the more disk needs to read. For a continuous set of columns the read should be linear, however.

-Jouni