You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Daniel Doubleday <da...@gmx.net> on 2010/08/27 14:57:52 UTC

Read before Write

Hi people

I was wondering if anyone already benchmarked such a situation:

I have:

day of year (row key) -> SomeId (column key) -> byte[0]

I need to make sure that I write SomeId, but in around 80% of the cases it will be already present (so I would essentially replace it with itself). RF will be 2.

So should I rather just write all the time (given that cassandra is so fast on write) or should I read and write only if not present?

Cheers,
Daniel

Re: Read before Write

Posted by Edward Capriolo <ed...@gmail.com>.

On Fri, Aug 27, 2010 at 1:26 PM, Ran Tavory <ra...@gmail.com> wrote:
> I haven't benchmarked so it's purely theoretical.
> If there's no caching then I'm pretty sure just writing would yield better
> performance.
> If you do cache rows/keys it really depends on your hit ratio. Naturally if
> you have a small data set and high cache ratio and use row caching I'm
> pretty sure it's better to read first.
> Although writes are order of magnitude faster than reads, if you have high
> write rate then cassandra might throttle you at different bottlenecks,
> depending on your hardware and data so for example disk is many times a
> bottleneck (and you can teak storage-conf to improve that), sometimes memory
> is pressing and I have seen also CPU pressure although it's less common.
> You need to also keep in mind that even if you write the same value but with
> a newer timestamp then cassandra will have to run compactions and that's
> where disk/mem is usually bottlenecking.
> Bottom line - if you can cache (have enough mem) and there's good hit ratio,
> cache entire rows and read first. If not, always write first and make sure
> compactions aren't killing you, if they are, tweak storage-conf to do less
> compactions.
>
> On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli <ch...@gmail.com> wrote:
>>
>> I think Just writing all the time is much better, as most of replacements
>> will be done in memtable.
>>
>> also you should set a large memtable size, in compared with the average
>> row size.
>>
>>
>> 2010/8/27 Daniel Doubleday <da...@gmx.net>
>>>
>>> Hi people
>>>
>>> I was wondering if anyone already benchmarked such a situation:
>>>
>>> I have:
>>>
>>> day of year (row key) -> SomeId (column key) -> byte[0]
>>>
>>> I need to make sure that I write SomeId, but in around 80% of the cases
>>> it will be already present (so I would essentially replace it with itself).
>>> RF will be 2.
>>>
>>> So should I rather just write all the time (given that cassandra is so
>>> fast on write) or should I read and write only if not present?
>>>
>>> Cheers,
>>> Daniel
>>
>>
>> --
>> Best Regards,
>> Chen Xinli
>
>

Read before write is usually a bad idea in cassandra.

We have a multiple node cluster with ~ 100 GB per node. We have a
fairly substantial 800,000 item row cache, which sees about a 70% hit
rate. Our application measures writes at QUORUM 1 ms, and reads at ONE
7-10, reads seem to be about 3-6 ms when the data was around 70GB per
node.

Given that a write takes 1 ms and a read takes 7 ms, and that reads
are more intensive I would almost never advocate reading before
writing.

Edward

Re: Read before Write

Posted by Ran Tavory <ra...@gmail.com>.

I haven't benchmarked so it's purely theoretical.
If there's no caching then I'm pretty sure just writing would yield better
performance.
If you do cache rows/keys it really depends on your hit ratio. Naturally if
you have a small data set and high cache ratio and use row caching I'm
pretty sure it's better to read first.
Although writes are order of magnitude faster than reads, if you have high
write rate then cassandra might throttle you at different bottlenecks,
depending on your hardware and data so for example disk is many times a
bottleneck (and you can teak storage-conf to improve that), sometimes memory
is pressing and I have seen also CPU pressure although it's less common.
You need to also keep in mind that even if you write the same value but with
a newer timestamp then cassandra will have to run compactions and that's
where disk/mem is usually bottlenecking.

Bottom line - if you can cache (have enough mem) and there's good hit ratio,
cache entire rows and read first. If not, always write first and make sure
compactions aren't killing you, if they are, tweak storage-conf to do less
compactions.

On Fri, Aug 27, 2010 at 5:44 PM, Chen Xinli <ch...@gmail.com> wrote:

> I think Just writing all the time is much better, as most of replacements
> will be done in memtable.
>
> also you should set a large memtable size, in compared with the average row
> size.
>
>
> 2010/8/27 Daniel Doubleday <da...@gmx.net>
>
> Hi people
>>
>> I was wondering if anyone already benchmarked such a situation:
>>
>> I have:
>>
>> day of year (row key) -> SomeId (column key) -> byte[0]
>>
>> I need to make sure that I write SomeId, but in around 80% of the cases it
>> will be already present (so I would essentially replace it with itself). RF
>> will be 2.
>>
>> So should I rather just write all the time (given that cassandra is so
>> fast on write) or should I read and write only if not present?
>>
>> Cheers,
>> Daniel
>
>
>
>
> --
> Best Regards,
> Chen Xinli
>

Re: Read before Write

Posted by Chen Xinli <ch...@gmail.com>.

I think Just writing all the time is much better, as most of replacements
will be done in memtable.

also you should set a large memtable size, in compared with the average row
size.


2010/8/27 Daniel Doubleday <da...@gmx.net>

> Hi people
>
> I was wondering if anyone already benchmarked such a situation:
>
> I have:
>
> day of year (row key) -> SomeId (column key) -> byte[0]
>
> I need to make sure that I write SomeId, but in around 80% of the cases it
> will be already present (so I would essentially replace it with itself). RF
> will be 2.
>
> So should I rather just write all the time (given that cassandra is so fast
> on write) or should I read and write only if not present?
>
> Cheers,
> Daniel




-- 
Best Regards,
Chen Xinli

Re: Read before Write

Posted by Aaron Morton <aa...@thelastpickle.com>.

If you are reading and making decisions about what to write just remember there are no transactions. You are essentially running at a Read Uncommitted level of transaction isolation, with regard of batch mutations (a mutation for a single row is atomic).

If you can it may be less headache to write without looking first.

Aaron

On 28 Aug 2010, at 00:57, Daniel Doubleday <da...@gmx.net> wrote:

> Hi people
> 
> I was wondering if anyone already benchmarked such a situation:
> 
> I have:
> 
> day of year (row key) -> SomeId (column key) -> byte[0]
> 
> I need to make sure that I write SomeId, but in around 80% of the cases it will be already present (so I would essentially replace it with itself). RF will be 2.
> 
> So should I rather just write all the time (given that cassandra is so fast on write) or should I read and write only if not present?
> 
> Cheers,
> Daniel