You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Radim Kolar <hs...@filez.com> on 2013/02/11 19:53:17 UTC

real leveldb vs cassandra leveldb

real leveldb is better in lot of areas:

L0 are 1/10 of L1 sstable size
tables can be promoted to upper levels if no merging is needed (there is 
hole)
variable number of sstables per level, but it tries to keep 1:10:100 
sstable ratios. Not hard requirement
very important - better hash function. murmur and md5 hashes are 
unsuitable for ldb because it turns key sequence into more or less 
random noise. Changing hash function to leveldb one gives about 8 times 
speed increase during seq. writes because far less table merges is needed
better merge policy: merge 1 table + up to 10 tables from next level 
into next + 2 level.

Re: real leveldb vs cassandra leveldb

Posted by Michael Kjellman <mk...@barracuda.com>.

I can promise you it is not a matter of not being interested in performance enhancements, but there is a trade off between stability, backwards comparability, etc. Some of the changes you are proposing have merit but they fundamentally change some decisions that have been made and Jonathan needs to weigh those tradeoffs for the project. 

Most importantly, Id file a bug with your proposed patches and have a positive discussion there. 

Best,
Michael

On Feb 22, 2013, at 4:04 AM, "Radim Kolar" <hs...@filez.com> wrote:

> Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
>> The only point here that would make a difference in practice is
>> leveldb using a worse hash function.
> how do you know that it would not make difference in practice. i have 
> implemented some optimalization from leveldb to cassandra - different L0 
> level - 12 tables of 1/10 size L1, faster promotion to next level and 
> variable number of sstables per level and it performs faster for write 
> heavy workload.
> 
> i do not understand why you are not interested in performance 
> optimalizations. For example yesterday i did buffered sstable writing 
> and it turned to be significant perfomance advantage using 1mb buffer 
> changed test time from 1m50s to 0m40s. Another significant gain is to 
> use read ahead in compactions and replace bucket based size tiered 
> compaction to compaction strategy used in lucene - it did not produce 
> immortal sstables like cassandra does.
> 
> i have quite high demands for perfomance, my test environment for new 
> project has about 1 bilion new rows per day, in production it will be 
> about 30 times higher, so what is worth coding for me might not be worth 
> coding for you.

Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.

Re: real leveldb vs cassandra leveldb

Posted by Jonathan Ellis <jb...@gmail.com>.

If you legitimately want to move the ball forward, here's how to do it:

- Create one jira ticket per idea
- Attach a patch
- Post your benchmark results

It's important to consider changes one at a time so we have the
clearest picture possible about what we are gaining.

On Fri, Feb 22, 2013 at 4:00 AM, Radim Kolar <hs...@filez.com> wrote:
> Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
>>
>> The only point here that would make a difference in practice is
>>
>> leveldb using a worse hash function.
>
> how do you know that it would not make difference in practice. i have
> implemented some optimalization from leveldb to cassandra - different L0
> level - 12 tables of 1/10 size L1, faster promotion to next level and
> variable number of sstables per level and it performs faster for write heavy
> workload.
>
> i do not understand why you are not interested in performance
> optimalizations. For example yesterday i did buffered sstable writing and it
> turned to be significant perfomance advantage using 1mb buffer changed test
> time from 1m50s to 0m40s. Another significant gain is to use read ahead in
> compactions and replace bucket based size tiered compaction to compaction
> strategy used in lucene - it did not produce immortal sstables like
> cassandra does.
>
> i have quite high demands for perfomance, my test environment for new
> project has about 1 bilion new rows per day, in production it will be about
> 30 times higher, so what is worth coding for me might not be worth coding
> for you.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: real leveldb vs cassandra leveldb

Posted by Radim Kolar <hs...@filez.com>.

Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
> The only point here that would make a difference in practice is
> leveldb using a worse hash function.
how do you know that it would not make difference in practice. i have 
implemented some optimalization from leveldb to cassandra - different L0 
level - 12 tables of 1/10 size L1, faster promotion to next level and 
variable number of sstables per level and it performs faster for write 
heavy workload.

i do not understand why you are not interested in performance 
optimalizations. For example yesterday i did buffered sstable writing 
and it turned to be significant perfomance advantage using 1mb buffer 
changed test time from 1m50s to 0m40s. Another significant gain is to 
use read ahead in compactions and replace bucket based size tiered 
compaction to compaction strategy used in lucene - it did not produce 
immortal sstables like cassandra does.

i have quite high demands for perfomance, my test environment for new 
project has about 1 bilion new rows per day, in production it will be 
about 30 times higher, so what is worth coding for me might not be worth 
coding for you.

Re: real leveldb vs cassandra leveldb

Posted by Jonathan Ellis <jb...@gmail.com>.

You have to use the same function if you want to do streaming i/o for
repair and node movement instead of random.

On Thu, Feb 14, 2013 at 7:48 AM, Radim Kolar <hs...@filez.com> wrote:
> Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
>
>> The only point here that would make a difference in practice is
>> leveldb using a worse hash function.  For us it's not worth making
>> partitioning worse to make compaction better.
>
> then use two hash functions. one for spliting rows to nodes and second for
> index inside leveldb.
>
> too much compactions is major problem of leveldb implementation in
> cassandra. Cassandra implements initial leveldb design ignoring improvements
> in about last 2 years in leveldb. You can try it yourself by benchmarking
> original and current leveldb.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: real leveldb vs cassandra leveldb

Posted by Radim Kolar <hs...@filez.com>.

Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
> The only point here that would make a difference in practice is
> leveldb using a worse hash function.  For us it's not worth making
> partitioning worse to make compaction better.
then use two hash functions. one for spliting rows to nodes and second 
for index inside leveldb.

too much compactions is major problem of leveldb implementation in 
cassandra. Cassandra implements initial leveldb design ignoring 
improvements in about last 2 years in leveldb. You can try it yourself 
by benchmarking original and current leveldb.

Re: real leveldb vs cassandra leveldb

Posted by Jonathan Ellis <jb...@gmail.com>.

The only point here that would make a difference in practice is
leveldb using a worse hash function.  For us it's not worth making
partitioning worse to make compaction better.

On Mon, Feb 11, 2013 at 12:53 PM, Radim Kolar <hs...@filez.com> wrote:
> real leveldb is better in lot of areas:
>
> L0 are 1/10 of L1 sstable size
> tables can be promoted to upper levels if no merging is needed (there is
> hole)
> variable number of sstables per level, but it tries to keep 1:10:100 sstable
> ratios. Not hard requirement
> very important - better hash function. murmur and md5 hashes are unsuitable
> for ldb because it turns key sequence into more or less random noise.
> Changing hash function to leveldb one gives about 8 times speed increase
> during seq. writes because far less table merges is needed
> better merge policy: merge 1 table + up to 10 tables from next level into
> next + 2 level.



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced