You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Radim Kolar <hs...@filez.com> on 2013/02/11 19:53:17 UTC
real leveldb vs cassandra leveldb
real leveldb is better in lot of areas:
L0 are 1/10 of L1 sstable size
tables can be promoted to upper levels if no merging is needed (there is
hole)
variable number of sstables per level, but it tries to keep 1:10:100
sstable ratios. Not hard requirement
very important - better hash function. murmur and md5 hashes are
unsuitable for ldb because it turns key sequence into more or less
random noise. Changing hash function to leveldb one gives about 8 times
speed increase during seq. writes because far less table merges is needed
better merge policy: merge 1 table + up to 10 tables from next level
into next + 2 level.
Re: real leveldb vs cassandra leveldb
Posted by Michael Kjellman <mk...@barracuda.com>.
I can promise you it is not a matter of not being interested in performance enhancements, but there is a trade off between stability, backwards comparability, etc. Some of the changes you are proposing have merit but they fundamentally change some decisions that have been made and Jonathan needs to weigh those tradeoffs for the project.
Most importantly, Id file a bug with your proposed patches and have a positive discussion there.
Best,
Michael
On Feb 22, 2013, at 4:04 AM, "Radim Kolar" <hs...@filez.com> wrote:
> Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
>> The only point here that would make a difference in practice is
>> leveldb using a worse hash function.
> how do you know that it would not make difference in practice. i have
> implemented some optimalization from leveldb to cassandra - different L0
> level - 12 tables of 1/10 size L1, faster promotion to next level and
> variable number of sstables per level and it performs faster for write
> heavy workload.
>
> i do not understand why you are not interested in performance
> optimalizations. For example yesterday i did buffered sstable writing
> and it turned to be significant perfomance advantage using 1mb buffer
> changed test time from 1m50s to 0m40s. Another significant gain is to
> use read ahead in compactions and replace bucket based size tiered
> compaction to compaction strategy used in lucene - it did not produce
> immortal sstables like cassandra does.
>
> i have quite high demands for perfomance, my test environment for new
> project has about 1 bilion new rows per day, in production it will be
> about 30 times higher, so what is worth coding for me might not be worth
> coding for you.
Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.
Re: real leveldb vs cassandra leveldb
Posted by Jonathan Ellis <jb...@gmail.com>.
If you legitimately want to move the ball forward, here's how to do it:
- Create one jira ticket per idea
- Attach a patch
- Post your benchmark results
It's important to consider changes one at a time so we have the
clearest picture possible about what we are gaining.
On Fri, Feb 22, 2013 at 4:00 AM, Radim Kolar <hs...@filez.com> wrote:
> Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
>>
>> The only point here that would make a difference in practice is
>>
>> leveldb using a worse hash function.
>
> how do you know that it would not make difference in practice. i have
> implemented some optimalization from leveldb to cassandra - different L0
> level - 12 tables of 1/10 size L1, faster promotion to next level and
> variable number of sstables per level and it performs faster for write heavy
> workload.
>
> i do not understand why you are not interested in performance
> optimalizations. For example yesterday i did buffered sstable writing and it
> turned to be significant perfomance advantage using 1mb buffer changed test
> time from 1m50s to 0m40s. Another significant gain is to use read ahead in
> compactions and replace bucket based size tiered compaction to compaction
> strategy used in lucene - it did not produce immortal sstables like
> cassandra does.
>
> i have quite high demands for perfomance, my test environment for new
> project has about 1 bilion new rows per day, in production it will be about
> 30 times higher, so what is worth coding for me might not be worth coding
> for you.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced
Re: real leveldb vs cassandra leveldb
Posted by Radim Kolar <hs...@filez.com>.
Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
> The only point here that would make a difference in practice is
> leveldb using a worse hash function.
how do you know that it would not make difference in practice. i have
implemented some optimalization from leveldb to cassandra - different L0
level - 12 tables of 1/10 size L1, faster promotion to next level and
variable number of sstables per level and it performs faster for write
heavy workload.
i do not understand why you are not interested in performance
optimalizations. For example yesterday i did buffered sstable writing
and it turned to be significant perfomance advantage using 1mb buffer
changed test time from 1m50s to 0m40s. Another significant gain is to
use read ahead in compactions and replace bucket based size tiered
compaction to compaction strategy used in lucene - it did not produce
immortal sstables like cassandra does.
i have quite high demands for perfomance, my test environment for new
project has about 1 bilion new rows per day, in production it will be
about 30 times higher, so what is worth coding for me might not be worth
coding for you.
Re: real leveldb vs cassandra leveldb
Posted by Jonathan Ellis <jb...@gmail.com>.
You have to use the same function if you want to do streaming i/o for
repair and node movement instead of random.
On Thu, Feb 14, 2013 at 7:48 AM, Radim Kolar <hs...@filez.com> wrote:
> Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
>
>> The only point here that would make a difference in practice is
>> leveldb using a worse hash function. For us it's not worth making
>> partitioning worse to make compaction better.
>
> then use two hash functions. one for spliting rows to nodes and second for
> index inside leveldb.
>
> too much compactions is major problem of leveldb implementation in
> cassandra. Cassandra implements initial leveldb design ignoring improvements
> in about last 2 years in leveldb. You can try it yourself by benchmarking
> original and current leveldb.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced
Re: real leveldb vs cassandra leveldb
Posted by Radim Kolar <hs...@filez.com>.
Dne 13.2.2013 16:32, Jonathan Ellis napsal(a):
> The only point here that would make a difference in practice is
> leveldb using a worse hash function. For us it's not worth making
> partitioning worse to make compaction better.
then use two hash functions. one for spliting rows to nodes and second
for index inside leveldb.
too much compactions is major problem of leveldb implementation in
cassandra. Cassandra implements initial leveldb design ignoring
improvements in about last 2 years in leveldb. You can try it yourself
by benchmarking original and current leveldb.
Re: real leveldb vs cassandra leveldb
Posted by Jonathan Ellis <jb...@gmail.com>.
The only point here that would make a difference in practice is
leveldb using a worse hash function. For us it's not worth making
partitioning worse to make compaction better.
On Mon, Feb 11, 2013 at 12:53 PM, Radim Kolar <hs...@filez.com> wrote:
> real leveldb is better in lot of areas:
>
> L0 are 1/10 of L1 sstable size
> tables can be promoted to upper levels if no merging is needed (there is
> hole)
> variable number of sstables per level, but it tries to keep 1:10:100 sstable
> ratios. Not hard requirement
> very important - better hash function. murmur and md5 hashes are unsuitable
> for ldb because it turns key sequence into more or less random noise.
> Changing hash function to leveldb one gives about 8 times speed increase
> during seq. writes because far less table merges is needed
> better merge policy: merge 1 table + up to 10 tables from next level into
> next + 2 level.
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced