You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dani Traphagen <da...@datastax.com> on 2015/12/16 01:38:15 UTC

Re: scylladb

You'll be the first Carlos.

[image: Inline image 1]

Had any rain lately? Curious how this went, if so.

On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> I just did a Twitter search on scylladb and did not see any tweets about
> actual use, so far.
>
>
> -- Jack Krupansky
>
> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>
> wrote:
>
>> Any update about this?
>>
>> @Carlos Rolo, did you tried it? Thoughts?
>>
>> Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com> wrote:
>>
>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>> www.pythian.com
>>>
>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>> dani.traphagen@datastax.com> wrote:
>>>
>>>> As of two days ago, they say they've got it @cjrolo.
>>>>
>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>
>>>>
>>>> On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com> wrote:
>>>>
>>>>> I will not try until multi-DC is implemented. More than an month has
>>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>>> take some time to test it.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>> www.pythian.com
>>>>>
>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>>>> feedback.
>>>>>>
>>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi guys,
>>>>>> >
>>>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>>>> in town) and has some thoughts/hands-on experience to share?
>>>>>> >
>>>>>> > Cheers,
>>>>>> > Tommaso
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>


-- 
[image: datastax_logo.png] <http://www.datastax.com/>

DANI TRAPHAGEN

Technical Enablement Lead | dani.traphagen@datastax.com

[image: twitter.png] <https://twitter.com/dtrapezoid> [image: linkedin.png]
<https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>
<https://github.com/dtrapezoid>

Re: scylladb

Posted by "Richard L. Burton III" <mr...@gmail.com>.

LWT's are not currently supported with Scylladb. That being said, they are
in scope and expect Scylladb to support it. The exact date is uncertain,
but it's something they have in scope.

On Thu, Mar 9, 2017 at 4:22 PM, Kant Kodali <ka...@peernova.com> wrote:

> Does scylladb has LWT's yet?
>
> On Thu, Mar 9, 2017 at 11:21 AM, daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> The comparison is fair, and conservative. Did substantial performance
>> comparisons for two clients, both results returned throughputs that were
>> faster than the published comparisons (15x as I recall). At that time the
>> client preferred to utilize a Cass COTS solution and use a caching solution
>> for OLA compliance.
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
>> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>>
>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <ro...@us2.nl> wrote:
>>
>>> I was wondering how people feel about the comparison that's made here
>>> between Cassandra and ScyllaDB : http://www.scylladb.com/tech
>>> nology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-3
>>> 0-cassandra-nodes
>>>
>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>> pros/cons known?
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> *Chief Data Architect*
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com> wrote:
>>>
>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>> short of installing it. Let's see if this one is for real!
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100> | Tel: +1 613 565
>>>> 8696 x1649 <+1%20613-565-8696>
>>>> www.pythian.com
>>>>
>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>> dani.traphagen@datastax.com> wrote:
>>>>
>>>>> You'll be the first Carlos.
>>>>>
>>>>> [image: Inline image 1]
>>>>>
>>>>> Had any rain lately? Curious how this went, if so.
>>>>>
>>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>>> jack.krupansky@gmail.com> wrote:
>>>>>
>>>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>>>> about actual use, so far.
>>>>>>
>>>>>>
>>>>>> -- Jack Krupansky
>>>>>>
>>>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Any update about this?
>>>>>>>
>>>>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>>>>
>>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>>> <https://twitter.com/calonso>
>>>>>>>
>>>>>>> On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com> wrote:
>>>>>>>
>>>>>>>> Something to do on a expected rainy weekend. Thanks for the
>>>>>>>> information.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Carlos Juzarte Rolo
>>>>>>>> Cassandra Consultant
>>>>>>>>
>>>>>>>> Pythian - Love your data
>>>>>>>>
>>>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>>>> www.pythian.com
>>>>>>>>
>>>>>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>>>>>> dani.traphagen@datastax.com> wrote:
>>>>>>>>
>>>>>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>>>>>
>>>>>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I will not try until multi-DC is implemented. More than an month
>>>>>>>>>> has passed since I looked for it, so it could possibly be in place, if so I
>>>>>>>>>> may take some time to test it.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Carlos Juzarte Rolo
>>>>>>>>>> Cassandra Consultant
>>>>>>>>>>
>>>>>>>>>> Pythian - Love your data
>>>>>>>>>>
>>>>>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>>>>>> www.pythian.com
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <
>>>>>>>>>> jonathan.haddad@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear
>>>>>>>>>>> your feedback.
>>>>>>>>>>>
>>>>>>>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <
>>>>>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi guys,
>>>>>>>>>>> >
>>>>>>>>>>> > did anyone already try Scylladb (yet another fastest NoSQL
>>>>>>>>>>> database in town) and has some thoughts/hands-on experience to share?
>>>>>>>>>>> >
>>>>>>>>>>> > Cheers,
>>>>>>>>>>> > Tommaso
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>
>>>>> DANI TRAPHAGEN
>>>>>
>>>>> Technical Enablement Lead | dani.traphagen@datastax.com
>>>>>
>>>>> [image: twitter.png] <https://twitter.com/dtrapezoid> [image:
>>>>> linkedin.png] <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>
>>>>> <https://github.com/dtrapezoid>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>
>>
>


-- 
-Richard L. Burton III
@rburton

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

Does scylladb has LWT's yet?

On Thu, Mar 9, 2017 at 11:21 AM, daemeon reiydelle <da...@gmail.com>
wrote:

> The comparison is fair, and conservative. Did substantial performance
> comparisons for two clients, both results returned throughputs that were
> faster than the published comparisons (15x as I recall). At that time the
> client preferred to utilize a Cass COTS solution and use a caching solution
> for OLA compliance.
>
>
> *.......*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 <(415)%20501-0198>London
> (+44) (0) 20 8144 9872 <+44%2020%208144%209872>*
>
> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <ro...@us2.nl> wrote:
>
>> I was wondering how people feel about the comparison that's made here
>> between Cassandra and ScyllaDB : http://www.scylladb.com/tech
>> nology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-
>> 30-cassandra-nodes
>>
>> They are claiming a 10x improvement, is that a fair comparison or maybe a
>> somewhat coloured view of a (micro)benchmark in a specific setup? Any
>> pros/cons known?
>>
>> Best regards,
>>
>> Robin Verlangen
>> *Chief Data Architect*
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com> wrote:
>>
>>> No rain at all! But I almost had it running last weekend, but stopped
>>> short of installing it. Let's see if this one is for real!
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100> | Tel: +1 613 565
>>> 8696 x1649 <+1%20613-565-8696>
>>> www.pythian.com
>>>
>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>> dani.traphagen@datastax.com> wrote:
>>>
>>>> You'll be the first Carlos.
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> Had any rain lately? Curious how this went, if so.
>>>>
>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>> jack.krupansky@gmail.com> wrote:
>>>>
>>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>>> about actual use, so far.
>>>>>
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>
>>>>> wrote:
>>>>>
>>>>>> Any update about this?
>>>>>>
>>>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>>>
>>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>>> <https://twitter.com/calonso>
>>>>>>
>>>>>> On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com> wrote:
>>>>>>
>>>>>>> Something to do on a expected rainy weekend. Thanks for the
>>>>>>> information.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Carlos Juzarte Rolo
>>>>>>> Cassandra Consultant
>>>>>>>
>>>>>>> Pythian - Love your data
>>>>>>>
>>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>>> www.pythian.com
>>>>>>>
>>>>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>>>>> dani.traphagen@datastax.com> wrote:
>>>>>>>
>>>>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>>>>
>>>>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I will not try until multi-DC is implemented. More than an month
>>>>>>>>> has passed since I looked for it, so it could possibly be in place, if so I
>>>>>>>>> may take some time to test it.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Carlos Juzarte Rolo
>>>>>>>>> Cassandra Consultant
>>>>>>>>>
>>>>>>>>> Pythian - Love your data
>>>>>>>>>
>>>>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>>>>> www.pythian.com
>>>>>>>>>
>>>>>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <
>>>>>>>>> jonathan.haddad@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear
>>>>>>>>>> your feedback.
>>>>>>>>>>
>>>>>>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <
>>>>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi guys,
>>>>>>>>>> >
>>>>>>>>>> > did anyone already try Scylladb (yet another fastest NoSQL
>>>>>>>>>> database in town) and has some thoughts/hands-on experience to share?
>>>>>>>>>> >
>>>>>>>>>> > Cheers,
>>>>>>>>>> > Tommaso
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>
>>>> DANI TRAPHAGEN
>>>>
>>>> Technical Enablement Lead | dani.traphagen@datastax.com
>>>>
>>>> [image: twitter.png] <https://twitter.com/dtrapezoid> [image:
>>>> linkedin.png] <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>
>>>> <https://github.com/dtrapezoid>
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>

Re: scylladb

Posted by Bhuvan Rawal <bh...@gmail.com>.

I'd say the benchmark would be complete only when at the point of inflexion
the necessary system benchmarks are provided.

Looking at scylladb report it is unclear as to what system parameter was
being the bottleneck. Also an observation - its mentioned in the report
that they are using 1KB ro and probably using default compression settings
so this could be a possible bottleneck (everytime 64K object would be
picked off and decompressed even though record is 1/64th the size):
https://groups.google.com/forum/#!topic/nosql-databases/9pett319cgs
This would really cripple performance if its the case.

Tuning 99%ile would be tricky in case of java because of background GC
happening - that really has to do with how GC parameters are tuned for the
specific workload.

I believe its pertinent to evaluate cassandra defaults - 100 MB per core
new heap which is recommended the Compression size which can cause troubles.

On Fri, Mar 10, 2017 at 5:12 AM, Kant Kodali <ka...@peernova.com> wrote:

> I dont think ScyllaDB performance is because of C++. The design decisions
> in scylladb are indeed different from Cassandra such as getting rid of SEDA
> and moving to TPC and so on.
>
> If someone thinks it is because of C++ then just show the benchmarks that
> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
> claims instead of stating it.
>
>
> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>
> wrote:
>
>> They spend an enormous amount of time focusing on performance. You can
>> expect them to continue on with their optimization and keep crushing it.
>>
>> P.S., I don't work for ScyllaDB.
>>
>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
>> wrote:
>>
>>> In all of their presentation they keep harping on the fact that scylladb
>>> is written in C++ and does not carry the overhead of Java.  Still the
>>> difference looks staggering.
>>> ________________________________________
>>> From: daemeon reiydelle <da...@gmail.com>
>>> Sent: Thursday, March 9, 2017 14:21
>>> To: user@cassandra.apache.org
>>> Subject: Re: scylladb
>>>
>>> The comparison is fair, and conservative. Did substantial performance
>>> comparisons for two clients, both results returned throughputs that were
>>> faster than the published comparisons (15x as I recall). At that time the
>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>> for OLA compliance.
>>>
>>>
>>> .......
>>>
>>> Daemeon C.M. Reiydelle
>>> USA (+1) 415.501.0198
>>> London (+44) (0) 20 8144 9872
>>>
>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>> robin@us2.nl>> wrote:
>>> I was wondering how people feel about the comparison that's made here
>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>> cassandra-nodes
>>>
>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>> pros/cons known?
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> Chief Data Architect
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
>>> rolo@pythian.com>> wrote:
>>> No rain at all! But I almost had it running last weekend, but stopped
>>> short of installing it. Let's see if this one is for real!
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613 565
>>> 8696 x1649<tel:+1%20613-565-8696>
>>> www.pythian.com<http://www.pythian.com/>
>>>
>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>> You'll be the first Carlos.
>>>
>>> [Inline image 1]
>>>
>>> Had any rain lately? Curious how this went, if so.
>>>
>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>> I just did a Twitter search on scylladb and did not see any tweets about
>>> actual use, so far.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>> <ma...@mrcalonso.com>> wrote:
>>> Any update about this?
>>>
>>> @Carlos Rolo, did you tried it? Thoughts?
>>>
>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso
>>> >
>>>
>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>> pythian.com>> wrote:
>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>> 613 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
>>> www.pythian.com<http://www.pythian.com/>
>>>
>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>> As of two days ago, they say they've got it @cjrolo.
>>>
>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>
>>>
>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>> rolo@pythian.com>> wrote:
>>> I will not try until multi-DC is implemented. More than an month has
>>> passed since I looked for it, so it could possibly be in place, if so I may
>>> take some time to test it.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>> 613 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
>>> www.pythian.com<http://www.pythian.com/>
>>>
>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>> wrote:
>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>> feedback.
>>>
>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>> wrote:
>>> >
>>> > Hi guys,
>>> >
>>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>>> town) and has some thoughts/hands-on experience to share?
>>> >
>>> > Cheers,
>>> > Tommaso
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> --
>>> Sent from mobile -- apologizes for brevity or errors.
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> [datastax_logo.png]<http://www.datastax.com/>
>>>
>>> DANI TRAPHAGEN
>>>
>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>> dani.traphagen@datastax.com>
>>>
>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>> https://github.com/dtrapezoid>
>>>
>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

Here's a test (by Samsung MSL) comparing Scylla to Cassandra 3.9:

http://www.scylladb.com/2017/02/15/scylladb-vs-cassandra-performance-benchmark-samsung/

there's a link at the end to the original report.

On 03/11/2017 09:08 PM, Bhuvan Rawal wrote:
> "Lastly, why don't you test Scylla yourself?  It's pretty easy to set 
> up, there's nothing to tune."
>  - The details are indeed compelling to have a go ahead and test it 
> for specific use case.
>
> If it works out good it can lead to good cost cut in infra costs as 
> well as having to manage less servers plus probably less time to 
> bootstrap & decommission nodes!
>
> It will also be interesting to have a benchmark with Cassandra 3 
> version as well, as the new storage engine is said to have better 
> performance:
> https://www.datastax.com/2015/12/storage-engine-30 
> <https://www.datastax.com/2015/12/storage-engine-30>
>
> Regards,
> Bhuvan
>
> On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     There is no magic 10X bullet.  It's a mix of multiple factors,
>     which can come up to less than 10X in some circumstances and more
>     than 10X in others, as has been reported on this thread by others.
>
>     TPC doesn't give _any_ advantage when you have just one core, and
>     can give more than 10X on a machine with a large number of cores. 
>     These are becoming more and more common, think of the recent AMD
>     Naples announcement; with 32 cores per socket you can have 128
>     logical cores in a two-socket server; or the AWS i3.16xlarge
>     instance with 32 cores / 64 vcpus.
>
>     You're welcome to browse our site to learn more about the
>     architecture, or watch this technical talk [1] I gave in QConSF
>     that highlights some of the techniques we use.
>
>     Of course it's possible to mistune Cassandra to give bad results,
>     that is why we spent a lot more time tuning Cassandra and
>     documenting everything than we spent on Scylla.  You can read the
>     report in [2], it is very detailed, and provides a wealth of
>     metrics like you'd expect.
>
>     I'm not going to comment about the Aerospike numbers, I haven't
>     studied them in detail.  And no, you can't multiply results like
>     that unless they were done with very similar configurations and
>     test harnesses.
>
>     Lastly, why don't you test Scylla yourself?  It's pretty easy to
>     set up, there's nothing to tune.
>
>     Avi
>
>     [1] https://www.infoq.com/presentations/scylladb
>     <https://www.infoq.com/presentations/scylladb>
>     [2]
>     http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/
>     <http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/>
>
>
>
>     On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>>     Agreed C++ gives an added advantage to talk to underlying
>>     hardware with better efficiency, it sound good but can a pice of
>>     code written in C++ give 1000% throughput than a Java app? Is TPC
>>     design 10X more performant than SEDA arch?
>>
>>     And if C/C++ is indeed that fast how can Aerospike (which is
>>     itself written in C) claim to be 10X faster than Scylla here
>>     http://www.aerospike.com/benchmarks/scylladb-initial/
>>     <http://www.aerospike.com/benchmarks/scylladb-initial/> ?
>>     (Combining your's and aerospike's benchmarks it appears that
>>     Aerospike is 100X performant than C* - I highly doubt that!! )
>>
>>     For a moment lets forget about evaluating 2 different databases,
>>     one can observe 10X performance difference between a mistuned
>>     cassandra cluster and one thats tuned as per data model - there
>>     are so many Tunables in yaml as well as table configs.
>>
>>     Idea is - in order to strengthen your claim, you need to provide
>>     complete system metrics (Disk, CPU, Network), the OPS increase
>>     starts to decay along with the configs used. Having plain ops per
>>     second and 99p latency is blackbox.
>>
>>     Regards,
>>     Bhuvan
>>
>>     On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <avi@scylladb.com
>>     <ma...@scylladb.com>> wrote:
>>
>>         ScyllaDB engineer here.
>>
>>         C++ is really an enabling technology here. It is directly
>>         responsible for a small fraction of the gain by executing
>>         faster than Java.  But it is indirectly responsible for the
>>         gain by allowing us direct control over memory and
>>         threading.  Just as an example, Scylla starts by taking over
>>         almost all of the machine's memory, and dynamically assigning
>>         it to memtables, cache, and working memory needed to handle
>>         requests in flight.  Memory is statically partitioned across
>>         cores, allowing us to exploit NUMA fully.  You can't do these
>>         things in Java.
>>
>>         I would say the major contributors to Scylla performance are:
>>          - thread-per-core design
>>          - replacement of the page cache with a row cache
>>          - careful attention to many small details, each contributing
>>         a little, but with a large overall impact
>>
>>         While I'm here I can say that performance is not the only
>>         goal here, it is stable and predictable performance over
>>         varying loads and during maintenance operations like repair,
>>         without any special tuning.  We measure the amount of CPU and
>>         I/O spent on foreground (user) and background (maintenance)
>>         tasks and divide them fairly.  This work is not complete but
>>         already makes operating Scylla a lot simpler.
>>
>>
>>         On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>         I dont think ScyllaDB performance is because of C++. The
>>>         design decisions in scylladb are indeed different from
>>>         Cassandra such as getting rid of SEDA and moving to TPC and
>>>         so on.
>>>
>>>         If someone thinks it is because of C++ then just show the
>>>         benchmarks that proves it is indeed the C++ which gave 10X
>>>         performance boost as ScyllaDB claims instead of stating it.
>>>
>>>
>>>         On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III
>>>         <mrburton@gmail.com <ma...@gmail.com>> wrote:
>>>
>>>             They spend an enormous amount of time focusing on
>>>             performance. You can expect them to continue on with
>>>             their optimization and keep crushing it.
>>>
>>>             P.S., I don't work for ScyllaDB.
>>>
>>>             On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar
>>>             <rakeshkumar464@outlook.com
>>>             <ma...@outlook.com>> wrote:
>>>
>>>                 In all of their presentation they keep harping on
>>>                 the fact that scylladb is written in C++ and does
>>>                 not carry the overhead of Java.  Still the
>>>                 difference looks staggering.
>>>                 ________________________________________
>>>                 From: daemeon reiydelle <daemeonr@gmail.com
>>>                 <ma...@gmail.com>>
>>>                 Sent: Thursday, March 9, 2017 14:21
>>>                 To: user@cassandra.apache.org
>>>                 <ma...@cassandra.apache.org>
>>>                 Subject: Re: scylladb
>>>
>>>                 The comparison is fair, and conservative. Did
>>>                 substantial performance comparisons for two clients,
>>>                 both results returned throughputs that were faster
>>>                 than the published comparisons (15x as I recall). At
>>>                 that time the client preferred to utilize a Cass
>>>                 COTS solution and use a caching solution for OLA
>>>                 compliance.
>>>
>>>
>>>                 .......
>>>
>>>                 Daemeon C.M. Reiydelle
>>>                 USA (+1) 415.501.0198 <tel:%28%2B1%29%20415.501.0198>
>>>                 London (+44) (0) 20 8144 9872
>>>                 <tel:%28%2B44%29%20%280%29%2020%208144%209872>
>>>
>>>                 On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen
>>>                 <robin@us2.nl
>>>                 <ma...@us2.nl><mailto:robin@us2.nl
>>>                 <ma...@us2.nl>>> wrote:
>>>                 I was wondering how people feel about the comparison
>>>                 that's made here between Cassandra and ScyllaDB :
>>>                 http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes
>>>                 <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>>>
>>>                 They are claiming a 10x improvement, is that a fair
>>>                 comparison or maybe a somewhat coloured view of a
>>>                 (micro)benchmark in a specific setup? Any pros/cons
>>>                 known?
>>>
>>>                 Best regards,
>>>
>>>                 Robin Verlangen
>>>                 Chief Data Architect
>>>
>>>                 Disclaimer: The information contained in this
>>>                 message and attachments is intended solely for the
>>>                 attention and use of the named addressee and may be
>>>                 confidential. If you are not the intended recipient,
>>>                 you are reminded that the information remains the
>>>                 property of the sender. You must not use, disclose,
>>>                 distribute, copy, print or rely on this e-mail. If
>>>                 you have received this message in error, please
>>>                 contact the sender immediately and irrevocably
>>>                 delete this message and any copies.
>>>
>>>                 On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo
>>>                 <rolo@pythian.com
>>>                 <ma...@pythian.com><mailto:rolo@pythian.com
>>>                 <ma...@pythian.com>>> wrote:
>>>                 No rain at all! But I almost had it running last
>>>                 weekend, but stopped short of installing it. Let's
>>>                 see if this one is for real!
>>>
>>>                 Regards,
>>>
>>>                 Carlos Juzarte Rolo
>>>                 Cassandra Consultant
>>>
>>>                 Pythian - Love your data
>>>
>>>                 rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>                 linkedin.com/in/carlosjuzarterolo
>>>                 <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>>                 <http://linkedin.com/in/carlosjuzarterolo>>
>>>                 Mobile: +351 91 891 81 00
>>>                 <tel:%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>>>                 | Tel: +1 613 565 8696 x1649
>>>                 <tel:%2B1%20613%20565%208696%20x1649><tel:+1%20613-565-8696>
>>>                 www.pythian.com
>>>                 <http://www.pythian.com><http://www.pythian.com/
>>>                 <http://www.pythian.com/>>
>>>
>>>                 On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen
>>>                 <dani.traphagen@datastax.com
>>>                 <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>>                 <ma...@datastax.com>>> wrote:
>>>                 You'll be the first Carlos.
>>>
>>>                 [Inline image 1]
>>>
>>>                 Had any rain lately? Curious how this went, if so.
>>>
>>>                 On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky
>>>                 <jack.krupansky@gmail.com
>>>                 <ma...@gmail.com><mailto:jack.krupansky@gmail.com
>>>                 <ma...@gmail.com>>> wrote:
>>>                 I just did a Twitter search on scylladb and did not
>>>                 see any tweets about actual use, so far.
>>>
>>>
>>>                 -- Jack Krupansky
>>>
>>>                 On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso
>>>                 <info@mrcalonso.com
>>>                 <ma...@mrcalonso.com><mailto:info@mrcalonso.com
>>>                 <ma...@mrcalonso.com>>> wrote:
>>>                 Any update about this?
>>>
>>>                 @Carlos Rolo, did you tried it? Thoughts?
>>>
>>>                 Carlos Alonso | Software Engineer |
>>>                 @calonso<https://twitter.com/calonso
>>>                 <https://twitter.com/calonso>>
>>>
>>>                 On 5 November 2015 at 14:07, Carlos Rolo
>>>                 <rolo@pythian.com
>>>                 <ma...@pythian.com><mailto:rolo@pythian.com
>>>                 <ma...@pythian.com>>> wrote:
>>>                 Something to do on a expected rainy weekend. Thanks
>>>                 for the information.
>>>
>>>                 Regards,
>>>
>>>                 Carlos Juzarte Rolo
>>>                 Cassandra Consultant
>>>
>>>                 Pythian - Love your data
>>>
>>>                 rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>                 linkedin.com/in/carlosjuzarterolo
>>>                 <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>>                 <http://linkedin.com/in/carlosjuzarterolo>>
>>>                 Mobile: +351 91 891 81 00
>>>                 <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>>>                 | Tel: +1 613 565 8696 x1649
>>>                 <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>>                 www.pythian.com
>>>                 <http://www.pythian.com><http://www.pythian.com/
>>>                 <http://www.pythian.com/>>
>>>
>>>                 On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen
>>>                 <dani.traphagen@datastax.com
>>>                 <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>>                 <ma...@datastax.com>>> wrote:
>>>                 As of two days ago, they say they've got it @cjrolo.
>>>
>>>                 https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>                 <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>>>
>>>
>>>                 On Thursday, November 5, 2015, Carlos Rolo
>>>                 <rolo@pythian.com
>>>                 <ma...@pythian.com><mailto:rolo@pythian.com
>>>                 <ma...@pythian.com>>> wrote:
>>>                 I will not try until multi-DC is implemented. More
>>>                 than an month has passed since I looked for it, so
>>>                 it could possibly be in place, if so I may take some
>>>                 time to test it.
>>>
>>>                 Regards,
>>>
>>>                 Carlos Juzarte Rolo
>>>                 Cassandra Consultant
>>>
>>>                 Pythian - Love your data
>>>
>>>                 rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>                 linkedin.com/in/carlosjuzarterolo
>>>                 <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>>                 <http://linkedin.com/in/carlosjuzarterolo>>
>>>                 Mobile: +351 91 891 81 00
>>>                 <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>>>                 | Tel: +1 613 565 8696 x1649
>>>                 <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>>                 www.pythian.com
>>>                 <http://www.pythian.com><http://www.pythian.com/
>>>                 <http://www.pythian.com/>>
>>>
>>>                 On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad
>>>                 <jonathan.haddad@gmail.com
>>>                 <ma...@gmail.com>> wrote:
>>>                 Nope, no one I know.  Let me know if you try it I'd
>>>                 love to hear your feedback.
>>>
>>>                 > On Nov 5, 2015, at 9:22 AM, tommaso barbugli
>>>                 <tbarbugli@gmail.com <ma...@gmail.com>>
>>>                 wrote:
>>>                 >
>>>                 > Hi guys,
>>>                 >
>>>                 > did anyone already try Scylladb (yet another
>>>                 fastest NoSQL database in town) and has some
>>>                 thoughts/hands-on experience to share?
>>>                 >
>>>                 > Cheers,
>>>                 > Tommaso
>>>
>>>
>>>
>>>
>>>                 --
>>>
>>>
>>>
>>>
>>>                 --
>>>                 Sent from mobile -- apologizes for brevity or errors.
>>>
>>>
>>>
>>>                 --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>                 --
>>>                 [datastax_logo.png]<http://www.datastax.com/
>>>                 <http://www.datastax.com/>>
>>>
>>>                 DANI TRAPHAGEN
>>>
>>>                 Technical Enablement Lead |
>>>                 dani.traphagen@datastax.com
>>>                 <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>>                 <ma...@datastax.com>>
>>>
>>>                 [twitter.png]<https://twitter.com/dtrapezoid
>>>                 <https://twitter.com/dtrapezoid>> [linkedin.png]
>>>                 <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85
>>>                 <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>
>>>                 [https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q
>>>                 <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>>>                 <https://github.com/dtrapezoid>
>>>
>>>                 [http://datastax.com/all/images/cs_logo_color_sm.png
>>>                 <http://datastax.com/all/images/cs_logo_color_sm.png>]
>>>
>>>
>>>
>>>                 --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>             -- 
>>>             -Richard L. Burton III
>>>             @rburton
>>>
>>>
>>
>>
>
>

Re: scylladb

Posted by Dor Laor <do...@scylladb.com>.

On Sat, Mar 11, 2017 at 2:19 PM, Kant Kodali <ka...@peernova.com> wrote:

> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of CPU. The high concurrency is need to
>> hide latency: disk latency, or the latency of contacting a remote node.
>>
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are
> non-blocking system calls then a thread per core and callback mechanism
> should suffice isn't it?*
>

In general, yes but in practice it's more complicated.
Each such thread runs different tasks, you need a mechanism to switch
between these
tasks, this is the seastar continuation engine in our case. However, things
get more
complicated. We found that we need a cpu scheduler which takes into account
the priority
of different tasks, such as repair, compaction, streaming, read operations
and write operations.
We always prioritize foreground operations over background ones and thus
even when we
repair TBs of data, latency is still very low (this feature is coming in
Scylla 1.8)



>
>
>> This means that the scheduler will need to switch contexts very often. A
>> kernel thread scheduler knows very little about the application, so it has
>> to switch a lot of context.  A user level scheduler is tightly bound to the
>> application, so it can perform the switching faster.
>>
>
> *sure but this applies in other direction as well. A user level scheduler
> has no idea about kernel level scheduler either.  There is literally no
> coordination between kernel level scheduler and user level scheduler in
> linux or any major OS. It may be possible with OS's *
>

Correct. That's why we let the OS scheduler to run just one thread per core
and we bind the thread to the cpu. Inside, we do our own stuff with the
seastar scheduler and the OS doesn't know and doesn't care.

More below


> *that support scheduler activation(LWP's) and upcall mechanism. Even then
> it is hard to say if it is all worth it (The research shows performance may
> not outweigh the complexity). Golang problem is exactly this if one creates
> 1000 go routines/green threads where each of them is making a blocking
> system call then it would create 1000 kernel threads underneath because it
> has no way to know that the kernel thread is blocked (no upcall). And in
> non-blocking case I still don't even see a significant performance when
> compared to few kernel threads with callback mechanism.  If you are saying
> user level scheduling is the Future (perhaps I would just let the
> researchers argue about it) As of today that is not case else languages
> would have had it natively instead of using third party frameworks or
> libraries. *
>

That's why we do not run blocking system calls at all. We had to limit
ourselves to the XFS filesystem
only since the others did have got AIO support. Recently we bypassed some
of the issues which
made EXT4 to block and it may be ok with our AIO pattern.

We even write a DNS implementation that doesn't block and doesn't lock (for
us, even a library that uses spin locks under the hood is bad).

Bare in mind that the whole thing is simple to run and the user doesn't
need to know anything of this complexity.




>
>
>> There are also implications on the concurrency primitives in use (locks
>> etc.) -- they will be much faster for the user-level scheduler, because
>> they cooperate with the scheduler.  For example, no atomic
>> read-modify-write instructions need to be executed.
>>
>
>
>      Second, how many (kernel) threads should you run?* This question one
> will always have. If there are 10K user level threads that maps to only one
> kernel thread then they cannot exploit parallelism. so there is no right
> answer but a thread per core is a reasonable/good choice. *
>

+1


>
>
>> If you run too few threads, then you will not be able to saturate the CPU
>> resources.  This is a common problem with Cassandra -- it's very hard to
>> get it to consume all of the CPU power on even a moderately large machine.
>> On the other hand, if you have too many threads, you will see latency rise
>> very quickly, because kernel scheduling granularity is on the order of
>> milliseconds.  User-level scheduling, because it leaves control in the hand
>> of the application, allows you to both saturate the CPU and maintain low
>> latency.
>>
>
>     F*or my workload and probably others I had seen Cassandra was always
> been CPU bound.*
>

Could be. However, try to make it CPU bound on 10 core, 20 core and more.
The more core you use, the less nodes you need and the overall overhead
decreases.


>
>> There are other factors, like NUMA-friendliness, but in the end it all
>> boils down to efficiency and control.
>>
>> None of this is new btw, it's pretty common in the storage world.
>>
>> Avi
>>
>>
>> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>
>> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
>> still don't see how user level scheduling can be beneficial (This is a well
>> debated problem)? How can this add to the performance? or say why is user
>> level scheduling necessary Given the Thread per core design and the
>> callback mechanism?
>>
>> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> Scylla uses a the seastar framework, which provides for both user-level
>>> thread scheduling and simple run-to-completion tasks.
>>>
>>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>>> transparent hugepages).
>>>
>>>
>>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>
>>> @Dor
>>>
>>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>>> that maps user level threads to kernel level threads? I thought C++ by
>>> default creates native kernel threads but sure nothing will stop someone to
>>> create a user level scheduling library if that's what you are talking about?
>>> 2) How can one create THP of size 1KB? According to this post
>>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>> looks like the valid values 2MB and 1GB.
>>>
>>> Thanks,
>>> kant
>>>
>>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>>>> there is potential, and follow through with your own tests.
>>>>
>>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>
>>>>
>>>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>>>> more. Every NoSQL vendor wins their benchmarks.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

One more thing. Pretty much every database that is written in C++ or Java
uses native kernel threads for non-blocking I/O as well. They didn't use
Seaster or Quasar but anyways I am going to read up on Seaster and see what
it really does.

On Sun, Mar 12, 2017 at 3:48 AM, Kant Kodali <ka...@peernova.com> wrote:

>
> If you have thread-per-core and N (logical) cores, and have M tasks
>> running concurrently where M > N, then you need a scheduler to decide which
>> of those M tasks gets to run on those N kernel threads.  Whether those M
>> tasks are user-level threads, or callbacks, or a mix of the two is
>> immaterial.  In such cases a scheduler always exists, even if it is a
>> simple FIFO queue.
>>
>
>
>> yes ofcourse scheduler is needed. But what you said is immaterial is
>> where I see the devil or say our conflict of arguments really are. Let the
>> kernel thread per core deal with callbacks rather than having to build a
>> user-level thread library and its scheduling mechanisms and the mapping
>> between them. This sounds more of an overhead in general but may work in a
>> specific case.
>>
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

> If you have thread-per-core and N (logical) cores, and have M tasks
> running concurrently where M > N, then you need a scheduler to decide which
> of those M tasks gets to run on those N kernel threads.  Whether those M
> tasks are user-level threads, or callbacks, or a mix of the two is
> immaterial.  In such cases a scheduler always exists, even if it is a
> simple FIFO queue.
>


> yes ofcourse scheduler is needed. But what you said is immaterial is where
> I see the devil or say our conflict of arguments really are. Let the kernel
> thread per core deal with callbacks rather than having to build a
> user-level thread library and its scheduling mechanisms and the mapping
> between them. This sounds more of an overhead in general but may work in a
> specific case.
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

If you have thread-per-core and N (logical) cores, and have M tasks 
running concurrently where M > N, then you need a scheduler to decide 
which of those M tasks gets to run on those N kernel threads.  Whether 
those M tasks are user-level threads, or callbacks, or a mix of the two 
is immaterial.  In such cases a scheduler always exists, even if it is a 
simple FIFO queue.


Scheduling happens either voluntarily (the task issues I/O) or 
involuntarily (the scheduler decides it needs to run another task to 
satisfy latency SLA), but it has to happen.  The only case where it 
doesn't need to happen is if M<=N, in which case your server will be 
underutilized whenever your task has to wait.


On 03/12/2017 12:17 PM, Kant Kodali wrote:
> @Avi
>
> I don't disagree with thread per core design and in fact I said that 
> is a reasonable/good choice. But I am having a hard time seeing 
> through how user level scheduling can make a significant difference 
> even in Non-blocking I/O case. My question really is that if you 
> already have TPC why do you need user level scheduling ? And if the 
> answer is to switch between user level tasks then I am simply trying 
> to say "concurrency is not parallelism" (just because one was able to 
> switch between user level threads doesn't mean they are running in 
> parallel underneath). Why not simple schedule those on kernel threads 
> running on those cores and have a callback mechanism. Why would one 
> need to deal with user level scheduling overhead and all the problems 
> that comes with it. This to me just sounds like difference in the 
> design paradigm but doesn't seem to add much to the performance.
>
> Seaster sounds very similar to Quasar. And I am not seeing great 
> benefits from it.
>
>
>
>
> On Sun, Mar 12, 2017 at 1:48 AM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     We already quantified it, the result is Scylla. Now, Scylla's
>     performance is only in part due to the threading model, so I can't
>     give you a number that quantifies how much just this aspect of the
>     design is worth.  Removing it (or adding it to Cassandra) is a
>     multi-man-year effort that I can't justify for this conversation.
>
>
>     If you want to continue to use kernel threads for you
>     applications, by all means continue to do so.  They're the right
>     choice for all but the most I/O intensive applications.  But for
>     these I/O intensive applications thread-per-core is the right
>     choice, regardless of the points you raise.
>
>
>     I encourage you to study the seastar code base [1] and
>     documentation [2] to see how we handled those problems. I'll also
>     comment a bit below.
>
>
>     [1] https://github.com/scylladb/seastar
>     <https://github.com/scylladb/seastar>
>
>     [2] http://www.seastar-project.org/ <http://www.seastar-project.org/>
>
>
>     On 03/12/2017 11:07 AM, Kant Kodali wrote:
>>     @Avi
>>
>>     "User-level scheduling is great for high performance I/O
>>     intensive applications like databases and file systems." This is
>>     generally a claim made by people you want to use user-level
>>     threads but I rarely had seen any significant performance gain.
>>     Since you are claiming that you do. It would be great if you can
>>     quantify that. The other day I have seen a benchmark of a Golang
>>     server which supports user level threads/green threads natively
>>     and it was able to handle 10K concurrent requests. Even Nginx
>>     which is written and C and uses kernel threads can handle that
>>     many with Non-blocking I/O. We all know concurrency is not
>>     parallelism.
>>
>>     You may have to pay for something which could be any of the
>>     following.
>>
>>     *Duplication of the schedulers*
>>     M:N requires two schedulers which basically do same work, one at
>>     user level and one in kernel. This is undesirable. It requires
>>     frequent data communications between kernel and user space for
>>     scheduling information transference.
>>
>>     Duplication takes more space in both Dcache and Icache for
>>     scheduling than a single scheduler. It is highly undesirable if
>>     cache misses are caused by the schedulers but the application,
>>     because a L2 cache miss could be more expensive than a kernel
>>     thread switch. Then the additional scheduler might become a
>>     trouble maker! In this case, to save kernel trappings does not
>>     justify a user-scheduler, which is more truen when the processors
>>     are providing faster and faster kernel trapping execution.
>
>
>     That's not a problem, at least in my experience. The kernel
>     scheduler needs to schedule only one thread, and that very
>     infrequently. It is completely out of any hot path.
>
>>
>>     *Thread local data maintenance*
>>     M:N has to maintain thread specific data, which are already
>>     provided by kernel for kernel thread, such as the TLS data, error
>>     number. To provide the same feature for user threads is not
>>     straightforward, because, for example, the error number is
>>     returned for system call failure and supported by kernel.
>>     User-level support degrades system performance and increases
>>     system complexity.
>
>     This is also not a problem, we capture error codes in exceptions
>     immediately after a system call and so we don't need to rely on
>     TLS for errno.
>
>>
>>     *System info oblivious*
>>     Kernel scheduler is close to underlying platform and
>>     architecture. It can take advantage of their features. This is
>>     difficult for user thread library because it's a layer at user
>>     level. User threads are second-order entities in the system. If a
>>     kernel thread uses a GDT slot for TLS data, a user thread perhaps
>>     can only use an LDT slot for TLS data. With increasingly more
>>     supports available from the new processors for
>>     threading/scheduling (Hyperthreading, NUMA, many-core), the
>>     second order nature seriously limits the ability of M:N threading.
>
>     Those are non-issues, in my experience.  In fact it's the other
>     way around, the kernel scheduler cannot assume anything about the
>     threads it is preempting and so has to save more state.  The
>     threads being preempted also cannot assume anything about the
>     kernel scheduler, and so have to use atomic read-modify-write
>     instructions for synchronization, and to perform a system call
>     whenever they need to block or wake another thread.
>
>
>
>
>>
>>     On Sun, Mar 12, 2017 at 1:05 AM, Avi Kivity <avi@scylladb.com
>>     <ma...@scylladb.com>> wrote:
>>
>>         btw, for an example of how user-level tasks can be scheduled
>>         in a way that cannot be done with kernel threads, see this
>>         pair of blog posts:
>>
>>
>>         http://www.scylladb.com/2016/04/14/io-scheduler-1/
>>         <http://www.scylladb.com/2016/04/14/io-scheduler-1/>
>>
>>         http://www.scylladb.com/2016/04/29/io-scheduler-2/
>>         <http://www.scylladb.com/2016/04/29/io-scheduler-2/>
>>
>>
>>         There's simply no way to get this kind of control when you
>>         rely on the kernel for scheduling and page cache management. 
>>         As a result you have to overprovision your node and then you
>>         mostly underutilize it.
>>
>>
>>         On 03/12/2017 10:23 AM, Avi Kivity wrote:
>>>
>>>
>>>
>>>         On 03/12/2017 12:19 AM, Kant Kodali wrote:
>>>>         My response is inline.
>>>>
>>>>         On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity
>>>>         <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>>
>>>>             There are several issues at play here.
>>>>
>>>>             First, a database runs a large number of concurrent
>>>>             operations, each of which only consumes a small amount
>>>>             of CPU. The high concurrency is need to hide latency:
>>>>             disk latency, or the latency of contacting a remote node.
>>>>
>>>>         *Ok so you are talking about hiding I/O latency. If all
>>>>         these I/O are non-blocking system calls then a thread per
>>>>         core and callback mechanism should suffice isn't it?*
>>>
>>>         Scylla uses a mix of user-level threads and callbacks. Most
>>>         of the code uses callbacks (fronted by a future/promise
>>>         API). SSTable writers (memtable flush, compaction) use a
>>>         user-level thread (internally implemented using callbacks). 
>>>         The important bit is multiplexing many concurrent operations
>>>         onto a single kernel thread.
>>>
>>>
>>>>             This means that the scheduler will need to switch
>>>>             contexts very often. A kernel thread scheduler knows
>>>>             very little about the application, so it has to switch
>>>>             a lot of context.  A user level scheduler is tightly
>>>>             bound to the application, so it can perform the
>>>>             switching faster.
>>>>
>>>>
>>>>         *sure but this applies in other direction as well. A user
>>>>         level scheduler has no idea about kernel level scheduler
>>>>         either.  There is literally no coordination between kernel
>>>>         level scheduler and user level scheduler in linux or any
>>>>         major OS. It may be possible with OS's that support
>>>>         scheduler activation(LWP's) and upcall mechanism. *
>>>
>>>         There is no need for coordination, because the kernel
>>>         scheduler has no scheduling decisions to make.  With one
>>>         thread per core, bound to its core, the kernel scheduler
>>>         can't make the wrong decision because it has just one choice.
>>>
>>>
>>>>         *Even then it is hard to say if it is all worth it (The
>>>>         research shows performance may not outweigh the
>>>>         complexity). Golang problem is exactly this if one creates
>>>>         1000 go routines/green threads where each of them is making
>>>>         a blocking system call then it would create 1000 kernel
>>>>         threads underneath because it has no way to know that the
>>>>         kernel thread is blocked (no upcall). *
>>>
>>>         All of the significant system calls we issue are through the
>>>         main thread, either asynchronous or non-blocking.
>>>
>>>>         *And in non-blocking case I still don't even see a
>>>>         significant performance when compared to few kernel threads
>>>>         with callback mechanism.*
>>>
>>>         We do.
>>>
>>>>         *  If you are saying user level scheduling is the Future
>>>>         (perhaps I would just let the researchers argue about it)
>>>>         As of today that is not case else languages would have had
>>>>         it natively instead of using third party frameworks or
>>>>         libraries.
>>>>         *
>>>
>>>         User-level scheduling is great for high performance I/O
>>>         intensive applications like databases and file systems. 
>>>         It's not a general solution, and it involves a lot of effort
>>>         to set up the infrastructure. However, for our use case, it
>>>         was worth it.
>>>
>>>>             There are also implications on the concurrency
>>>>             primitives in use (locks etc.) -- they will be much
>>>>             faster for the user-level scheduler, because they
>>>>             cooperate with the scheduler.  For example, no atomic
>>>>             read-modify-write instructions need to be executed.
>>>>
>>>>
>>>>              Second, how many (kernel) threads should you run?*This
>>>>         question one will always have. If there are 10K user level
>>>>         threads that maps to only one kernel thread then they
>>>>         cannot exploit parallelism. so there is no right answer but
>>>>         a thread per core is a reasonable/good choice.
>>>>         *
>>>
>>>         Only if you can multiplex many operations on top of each of
>>>         those threads.  Otherwise, the CPUs end up underutilized.
>>>
>>>>             If you run too few threads, then you will not be able
>>>>             to saturate the CPU resources.  This is a common
>>>>             problem with Cassandra -- it's very hard to get it to
>>>>             consume all of the CPU power on even a moderately large
>>>>             machine. On the other hand, if you have too many
>>>>             threads, you will see latency rise very quickly,
>>>>             because kernel scheduling granularity is on the order
>>>>             of milliseconds. User-level scheduling, because it
>>>>             leaves control in the hand of the application, allows
>>>>             you to both saturate the CPU and maintain low latency.
>>>>
>>>>
>>>>             F*or my workload and probably others I had seen
>>>>         Cassandra was always been CPU bound.*
>>>>
>>>>
>>>
>>>
>>>         Yes, but does it consume 100% of all of the cores on your
>>>         machine? Cassandra generally doesn't (on a larger machine),
>>>         and when you profile it, you see it spending much of its
>>>         time in atomic operations, or parking/unparking threads --
>>>         fighting with itself.  It doesn't scale within the machine. 
>>>         Scylla will happily utilize all of the cores that it is
>>>         assigned (all of them by default in most configurations),
>>>         and the bigger the machine you give it, the happier it will be.
>>>
>>>>             There are other factors, like NUMA-friendliness, but in
>>>>             the end it all boils down to efficiency and control.
>>>>
>>>>             None of this is new btw, it's pretty common in the
>>>>             storage world.
>>>>
>>>>             Avi
>>>>
>>>>
>>>>             On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>>>>             Here is the Java version
>>>>>             http://docs.paralleluniverse.co/quasar/
>>>>>             <http://docs.paralleluniverse.co/quasar/> but I still
>>>>>             don't see how user level scheduling can be beneficial
>>>>>             (This is a well debated problem)? How can this add to
>>>>>             the performance? or say why is user level scheduling
>>>>>             necessary Given the Thread per core design and the
>>>>>             callback mechanism?
>>>>>
>>>>>             On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity
>>>>>             <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>>>
>>>>>                 Scylla uses a the seastar framework, which
>>>>>                 provides for both user-level thread scheduling and
>>>>>                 simple run-to-completion tasks.
>>>>>
>>>>>                 Huge pages are limited to 2MB (and 1GB, but these
>>>>>                 aren't available as transparent hugepages).
>>>>>
>>>>>
>>>>>                 On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>>>>                 @Dor
>>>>>>
>>>>>>                 1) You guys have a CPU scheduler? you mean user
>>>>>>                 level thread Scheduler that maps user level
>>>>>>                 threads to kernel level threads? I thought C++ by
>>>>>>                 default creates native kernel threads but sure
>>>>>>                 nothing will stop someone to create a user level
>>>>>>                 scheduling library if that's what you are talking
>>>>>>                 about?
>>>>>>                 2) How can one create THP of size 1KB? According
>>>>>>                 to this post
>>>>>>                 <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>>>>>                 looks like the valid values 2MB and 1GB.
>>>>>>
>>>>>>                 Thanks,
>>>>>>                 kant
>>>>>>
>>>>>>                 On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity
>>>>>>                 <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>>>>
>>>>>>                     Agreed, I'd recommend to treat benchmarks as
>>>>>>                     a rough guide to see where there is
>>>>>>                     potential, and follow through with your own
>>>>>>                     tests.
>>>>>>
>>>>>>                     On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>>>>
>>>>>>>                     Benchmarks are great for FUDly blog posts.
>>>>>>>                     Real world work loads matter more. Every
>>>>>>>                     NoSQL vendor wins their benchmarks.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Re: scylladb

Posted by Bhuvan Rawal <bh...@gmail.com>.



On Sun, Mar 12, 2017 at 2:42 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Looking at the costs of cloud instances, it clearly appears the cost of
> CPU dictates the overall cost of the instance. Having 2X more cores
> increases cost by nearly 2X keeping other things same as can be seen below
> as an example:
>
> (C3 may have slightly better processor but not more than 10-15% peformance
> increase)
>
> Optimising for fewer CPU cycles will invariably reduce costs by a large
> factor. On a modern day machine with SSD's where data density on node can
> be high more requests can be assumed to be served from single node, things
> get CPU bound. Perhaps its because it was invented at a time when SSD's did
> not exist. If we observe closely, many of cassandra defaults are assuming
> disk is rotational - number of flush writers, concurrent compactors, etc.
> The design suggest that too (Using Sequential io as far as possible. Infact
> thats the underlying philosophy for sequential sstable flushes and
> sequential commitlog files to avoid random io). Perhaps if it was designed
> currently things may look radically different.
>
> Comparing an average hard disk - ~200 iops  vs ~40K for ssd thats approx
> 200 times increase effectively increasing expectation from processor to
> serve significantly higher ops per second.
>
> In order to extract best from a modern day node it may need significant
> changes such like below :
> https://issues.apache.org/jira/browse/CASSANDRA-10989
> Possibly going forward the number of cores per node is only going to
> increase as it has been seen for last 5-6 years. In a way thats suggesting
> a significant change in design and possibly thats what scylladb is upto.
>
> "We found that we need a cpu scheduler which takes into account the
> priority of different tasks, such as repair, compaction, streaming, read
> operations and write operations."
> From my understanding in Cassandra as well compaction threads run on low
> nice priority - not sure about repair/streaming.
> http://grokbase.com/t/cassandra/user/14a85xpce7/significant-nice-cpu-usage
>
> Regards,
>
> On Sun, Mar 12, 2017 at 2:35 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> btw, for an example of how user-level tasks can be scheduled in a way
>> that cannot be done with kernel threads, see this pair of blog posts:
>>
>>
>>   http://www.scylladb.com/2016/04/14/io-scheduler-1/
>>
>>   http://www.scylladb.com/2016/04/29/io-scheduler-2/
>>
>>
>> There's simply no way to get this kind of control when you rely on the
>> kernel for scheduling and page cache management.  As a result you have to
>> overprovision your node and then you mostly underutilize it.
>>
>> On 03/12/2017 10:23 AM, Avi Kivity wrote:
>>
>>
>>
>> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>>
>> My response is inline.
>>
>> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> There are several issues at play here.
>>>
>>> First, a database runs a large number of concurrent operations, each of
>>> which only consumes a small amount of CPU. The high concurrency is need to
>>> hide latency: disk latency, or the latency of contacting a remote node.
>>>
>>
>> *Ok so you are talking about hiding I/O latency.  If all these I/O are
>> non-blocking system calls then a thread per core and callback mechanism
>> should suffice isn't it?*
>>
>>
>>
>> Scylla uses a mix of user-level threads and callbacks. Most of the code
>> uses callbacks (fronted by a future/promise API). SSTable writers
>> (memtable flush, compaction) use a user-level thread (internally
>> implemented using callbacks).  The important bit is multiplexing many
>> concurrent operations onto a single kernel thread.
>>
>>
>> This means that the scheduler will need to switch contexts very often. A
>>> kernel thread scheduler knows very little about the application, so it has
>>> to switch a lot of context.  A user level scheduler is tightly bound to the
>>> application, so it can perform the switching faster.
>>>
>>
>> *sure but this applies in other direction as well. A user level scheduler
>> has no idea about kernel level scheduler either.  There is literally no
>> coordination between kernel level scheduler and user level scheduler in
>> linux or any major OS. It may be possible with OS's that support scheduler
>> activation(LWP's) and upcall mechanism. *
>>
>>
>> There is no need for coordination, because the kernel scheduler has no
>> scheduling decisions to make.  With one thread per core, bound to its core,
>> the kernel scheduler can't make the wrong decision because it has just one
>> choice.
>>
>>
>> *Even then it is hard to say if it is all worth it (The research shows
>> performance may not outweigh the complexity). Golang problem is exactly
>> this if one creates 1000 go routines/green threads where each of them is
>> making a blocking system call then it would create 1000 kernel threads
>> underneath because it has no way to know that the kernel thread is blocked
>> (no upcall). *
>>
>>
>> All of the significant system calls we issue are through the main thread,
>> either asynchronous or non-blocking.
>>
>> *And in non-blocking case I still don't even see a significant
>> performance when compared to few kernel threads with callback mechanism.*
>>
>>
>> We do.
>>
>>
>> *  If you are saying user level scheduling is the Future (perhaps I would
>> just let the researchers argue about it) As of today that is not case else
>> languages would have had it natively instead of using third party
>> frameworks or libraries. *
>>
>>
>> User-level scheduling is great for high performance I/O intensive
>> applications like databases and file systems.  It's not a general solution,
>> and it involves a lot of effort to set up the infrastructure. However, for
>> our use case, it was worth it.
>>
>>
>>
>>> There are also implications on the concurrency primitives in use (locks
>>> etc.) -- they will be much faster for the user-level scheduler, because
>>> they cooperate with the scheduler.  For example, no atomic
>>> read-modify-write instructions need to be executed.
>>>
>>
>>
>>      Second, how many (kernel) threads should you run?
>> * This question one will always have. If there are 10K user level threads
>> that maps to only one kernel thread then they cannot exploit parallelism.
>> so there is no right answer but a thread per core is a reasonable/good
>> choice. *
>>
>>
>> Only if you can multiplex many operations on top of each of those
>> threads.  Otherwise, the CPUs end up underutilized.
>>
>>
>>
>>> If you run too few threads, then you will not be able to saturate the
>>> CPU resources.  This is a common problem with Cassandra -- it's very hard
>>> to get it to consume all of the CPU power on even a moderately large
>>> machine. On the other hand, if you have too many threads, you will see
>>> latency rise very quickly, because kernel scheduling granularity is on the
>>> order of milliseconds.  User-level scheduling, because it leaves control in
>>> the hand of the application, allows you to both saturate the CPU and
>>> maintain low latency.
>>>
>>
>>     F*or my workload and probably others I had seen Cassandra was always
>> been CPU bound.*
>>
>>>
>>>
>>
>> Yes, but does it consume 100% of all of the cores on your machine?
>> Cassandra generally doesn't (on a larger machine), and when you profile it,
>> you see it spending much of its time in atomic operations, or
>> parking/unparking threads -- fighting with itself.  It doesn't scale within
>> the machine.  Scylla will happily utilize all of the cores that it is
>> assigned (all of them by default in most configurations), and the bigger
>> the machine you give it, the happier it will be.
>>
>> There are other factors, like NUMA-friendliness, but in the end it all
>>> boils down to efficiency and control.
>>>
>>> None of this is new btw, it's pretty common in the storage world.
>>>
>>> Avi
>>>
>>>
>>> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>>
>>> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
>>> still don't see how user level scheduling can be beneficial (This is a well
>>> debated problem)? How can this add to the performance? or say why is user
>>> level scheduling necessary Given the Thread per core design and the
>>> callback mechanism?
>>>
>>> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>>> Scylla uses a the seastar framework, which provides for both user-level
>>>> thread scheduling and simple run-to-completion tasks.
>>>>
>>>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>>>> transparent hugepages).
>>>>
>>>>
>>>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>>
>>>> @Dor
>>>>
>>>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>>>> that maps user level threads to kernel level threads? I thought C++ by
>>>> default creates native kernel threads but sure nothing will stop someone to
>>>> create a user level scheduling library if that's what you are talking about?
>>>> 2) How can one create THP of size 1KB? According to this post
>>>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>>> looks like the valid values 2MB and 1GB.
>>>>
>>>> Thanks,
>>>> kant
>>>>
>>>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>>>
>>>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see
>>>>> where there is potential, and follow through with your own tests.
>>>>>
>>>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>>
>>>>>
>>>>> Benchmarks are great for FUDly blog posts. Real world work loads
>>>>> matter more. Every NoSQL vendor wins their benchmarks.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>

Re: scylladb

Posted by Bhuvan Rawal <bh...@gmail.com>.

Looking at the costs of cloud instances, it clearly appears the cost of CPU
dictates the overall cost of the instance. Having 2X more cores increases
cost by nearly 2X keeping other things same as can be seen below as an
example:

(C3 may have slightly better processor but not more than 10-15% peformance
increase)

Optimising for fewer CPU cycles will invariably reduce costs by a large
factor. On a modern day machine with SSD's where data density on node can
be high more requests can be assumed to be served from single node, things
get CPU bound. Perhaps its because it was invented at a time when SSD's did
not exist. If we observe closely, many of cassandra defaults are assuming
disk is rotational - number of flush writers, concurrent compactors, etc.
The design suggest that too (Using Sequential io as far as possible. Infact
thats the underlying philosophy for sequential sstable flushes and
sequential commitlog files to avoid random io). Perhaps if it was designed
currently things may look radically different.

Comparing an average hard disk - ~200 iops  vs ~40K for ssd thats approx
200 times increase effectively increasing expectation from processor to
serve significantly higher ops per second.

In order to extract best from a modern day node it may need significant
changes such like below :
https://issues.apache.org/jira/browse/CASSANDRA-10989
Possibly going forward the number of cores per node is only going to
increase as it has been seen for last 5-6 years. In a way thats suggesting
a significant change in design and possibly thats what scylladb is upto.

"We found that we need a cpu scheduler which takes into account the
priority of different tasks, such as repair, compaction, streaming, read
operations and write operations."
From my understanding in Cassandra as well compaction threads run on low
nice priority - not sure about repair/streaming.
http://grokbase.com/t/cassandra/user/14a85xpce7/significant-nice-cpu-usage

Regards,

On Sun, Mar 12, 2017 at 2:35 PM, Avi Kivity <av...@scylladb.com> wrote:

> btw, for an example of how user-level tasks can be scheduled in a way that
> cannot be done with kernel threads, see this pair of blog posts:
>
>
>   http://www.scylladb.com/2016/04/14/io-scheduler-1/
>
>   http://www.scylladb.com/2016/04/29/io-scheduler-2/
>
>
> There's simply no way to get this kind of control when you rely on the
> kernel for scheduling and page cache management.  As a result you have to
> overprovision your node and then you mostly underutilize it.
>
> On 03/12/2017 10:23 AM, Avi Kivity wrote:
>
>
>
> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of CPU. The high concurrency is need to
>> hide latency: disk latency, or the latency of contacting a remote node.
>>
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are
> non-blocking system calls then a thread per core and callback mechanism
> should suffice isn't it?*
>
>
>
> Scylla uses a mix of user-level threads and callbacks. Most of the code
> uses callbacks (fronted by a future/promise API). SSTable writers
> (memtable flush, compaction) use a user-level thread (internally
> implemented using callbacks).  The important bit is multiplexing many
> concurrent operations onto a single kernel thread.
>
>
> This means that the scheduler will need to switch contexts very often. A
>> kernel thread scheduler knows very little about the application, so it has
>> to switch a lot of context.  A user level scheduler is tightly bound to the
>> application, so it can perform the switching faster.
>>
>
> *sure but this applies in other direction as well. A user level scheduler
> has no idea about kernel level scheduler either.  There is literally no
> coordination between kernel level scheduler and user level scheduler in
> linux or any major OS. It may be possible with OS's that support scheduler
> activation(LWP's) and upcall mechanism. *
>
>
> There is no need for coordination, because the kernel scheduler has no
> scheduling decisions to make.  With one thread per core, bound to its core,
> the kernel scheduler can't make the wrong decision because it has just one
> choice.
>
>
> *Even then it is hard to say if it is all worth it (The research shows
> performance may not outweigh the complexity). Golang problem is exactly
> this if one creates 1000 go routines/green threads where each of them is
> making a blocking system call then it would create 1000 kernel threads
> underneath because it has no way to know that the kernel thread is blocked
> (no upcall). *
>
>
> All of the significant system calls we issue are through the main thread,
> either asynchronous or non-blocking.
>
> *And in non-blocking case I still don't even see a significant performance
> when compared to few kernel threads with callback mechanism.*
>
>
> We do.
>
>
> *  If you are saying user level scheduling is the Future (perhaps I would
> just let the researchers argue about it) As of today that is not case else
> languages would have had it natively instead of using third party
> frameworks or libraries. *
>
>
> User-level scheduling is great for high performance I/O intensive
> applications like databases and file systems.  It's not a general solution,
> and it involves a lot of effort to set up the infrastructure. However, for
> our use case, it was worth it.
>
>
>
>> There are also implications on the concurrency primitives in use (locks
>> etc.) -- they will be much faster for the user-level scheduler, because
>> they cooperate with the scheduler.  For example, no atomic
>> read-modify-write instructions need to be executed.
>>
>
>
>      Second, how many (kernel) threads should you run?
> * This question one will always have. If there are 10K user level threads
> that maps to only one kernel thread then they cannot exploit parallelism.
> so there is no right answer but a thread per core is a reasonable/good
> choice. *
>
>
> Only if you can multiplex many operations on top of each of those
> threads.  Otherwise, the CPUs end up underutilized.
>
>
>
>> If you run too few threads, then you will not be able to saturate the CPU
>> resources.  This is a common problem with Cassandra -- it's very hard to
>> get it to consume all of the CPU power on even a moderately large machine.
>> On the other hand, if you have too many threads, you will see latency rise
>> very quickly, because kernel scheduling granularity is on the order of
>> milliseconds.  User-level scheduling, because it leaves control in the hand
>> of the application, allows you to both saturate the CPU and maintain low
>> latency.
>>
>
>     F*or my workload and probably others I had seen Cassandra was always
> been CPU bound.*
>
>>
>>
>
> Yes, but does it consume 100% of all of the cores on your machine?
> Cassandra generally doesn't (on a larger machine), and when you profile it,
> you see it spending much of its time in atomic operations, or
> parking/unparking threads -- fighting with itself.  It doesn't scale within
> the machine.  Scylla will happily utilize all of the cores that it is
> assigned (all of them by default in most configurations), and the bigger
> the machine you give it, the happier it will be.
>
> There are other factors, like NUMA-friendliness, but in the end it all
>> boils down to efficiency and control.
>>
>> None of this is new btw, it's pretty common in the storage world.
>>
>> Avi
>>
>>
>> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>
>> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
>> still don't see how user level scheduling can be beneficial (This is a well
>> debated problem)? How can this add to the performance? or say why is user
>> level scheduling necessary Given the Thread per core design and the
>> callback mechanism?
>>
>> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> Scylla uses a the seastar framework, which provides for both user-level
>>> thread scheduling and simple run-to-completion tasks.
>>>
>>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>>> transparent hugepages).
>>>
>>>
>>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>
>>> @Dor
>>>
>>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>>> that maps user level threads to kernel level threads? I thought C++ by
>>> default creates native kernel threads but sure nothing will stop someone to
>>> create a user level scheduling library if that's what you are talking about?
>>> 2) How can one create THP of size 1KB? According to this post
>>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>> looks like the valid values 2MB and 1GB.
>>>
>>> Thanks,
>>> kant
>>>
>>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>>>> there is potential, and follow through with your own tests.
>>>>
>>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>
>>>>
>>>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>>>> more. Every NoSQL vendor wins their benchmarks.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

@Avi

I don't disagree with thread per core design and in fact I said that is a
reasonable/good choice. But I am having a hard time seeing through how user
level scheduling can make a significant difference even in Non-blocking I/O
case. My question really is that if you already have TPC why do you need
user level scheduling ? And if the answer is to switch between user level
tasks then I am simply trying to say "concurrency is not parallelism" (just
because one was able to switch between user level threads doesn't mean they
are running in parallel underneath). Why not simple schedule those on
kernel threads running on those cores and have a callback mechanism. Why
would one need to deal with user level scheduling overhead and all the
problems that comes with it. This to me just sounds like difference in the
design paradigm but doesn't seem to add much to the performance.

Seaster sounds very similar to Quasar. And I am not seeing great benefits
from it.




On Sun, Mar 12, 2017 at 1:48 AM, Avi Kivity <av...@scylladb.com> wrote:

> We already quantified it, the result is Scylla. Now, Scylla's performance
> is only in part due to the threading model, so I can't give you a number
> that quantifies how much just this aspect of the design is worth.  Removing
> it (or adding it to Cassandra) is a multi-man-year effort that I can't
> justify for this conversation.
>
>
> If you want to continue to use kernel threads for you applications, by all
> means continue to do so.  They're the right choice for all but the most I/O
> intensive applications.  But for these I/O intensive applications
> thread-per-core is the right choice, regardless of the points you raise.
>
>
> I encourage you to study the seastar code base [1] and documentation [2]
> to see how we handled those problems.  I'll also comment a bit below.
>
>
> [1] https://github.com/scylladb/seastar
>
> [2] http://www.seastar-project.org/
>
> On 03/12/2017 11:07 AM, Kant Kodali wrote:
>
> @Avi
>
> "User-level scheduling is great for high performance I/O intensive
> applications like databases and file systems." This is generally a claim
> made by people you want to use user-level threads but I rarely had seen any
> significant performance gain. Since you are claiming that you do. It would
> be great if you can quantify that. The other day I have seen a benchmark of
> a Golang server which supports user level threads/green threads natively
> and it was able to handle 10K concurrent requests. Even Nginx which is
> written and C and uses kernel threads can handle that many with
> Non-blocking I/O. We all know concurrency is not parallelism.
>
> You may have to pay for something which could be any of the following.
>
> *Duplication of the schedulers*
> M:N requires two schedulers which basically do same work, one at user
> level and one in kernel. This is undesirable. It requires frequent data
> communications between kernel and user space for scheduling information
> transference.
>
> Duplication takes more space in both Dcache and Icache for scheduling than
> a single scheduler. It is highly undesirable if cache misses are caused by
> the schedulers but the application, because a L2 cache miss could be more
> expensive than a kernel thread switch. Then the additional scheduler might
> become a trouble maker! In this case, to save kernel trappings does not
> justify a user-scheduler, which is more truen when the processors are
> providing faster and faster kernel trapping execution.
>
>
>
> That's not a problem, at least in my experience. The kernel scheduler
> needs to schedule only one thread, and that very infrequently. It is
> completely out of any hot path.
>
>
> *Thread local data maintenance*
> M:N has to maintain thread specific data, which are already provided by
> kernel for kernel thread, such as the TLS data, error number. To provide
> the same feature for user threads is not straightforward, because, for
> example, the error number is returned for system call failure and supported
> by kernel. User-level support degrades system performance and increases
> system complexity.
>
>
> This is also not a problem, we capture error codes in exceptions
> immediately after a system call and so we don't need to rely on TLS for
> errno.
>
>
> *System info oblivious*
> Kernel scheduler is close to underlying platform and architecture. It can
> take advantage of their features. This is difficult for user thread library
> because it's a layer at user level. User threads are second-order entities
> in the system. If a kernel thread uses a GDT slot for TLS data, a user
> thread perhaps can only use an LDT slot for TLS data. With increasingly
> more supports available from the new processors for threading/scheduling
> (Hyperthreading, NUMA, many-core), the second order nature seriously limits
> the ability of M:N threading.
>
>
> Those are non-issues, in my experience.  In fact it's the other way
> around, the kernel scheduler cannot assume anything about the threads it is
> preempting and so has to save more state.  The threads being preempted also
> cannot assume anything about the kernel scheduler, and so have to use
> atomic read-modify-write instructions for synchronization, and to perform a
> system call whenever they need to block or wake another thread.
>
>
>
>
>
> On Sun, Mar 12, 2017 at 1:05 AM, Avi Kivity <av...@scylladb.com> wrote:
>
>> btw, for an example of how user-level tasks can be scheduled in a way
>> that cannot be done with kernel threads, see this pair of blog posts:
>>
>>
>>   http://www.scylladb.com/2016/04/14/io-scheduler-1/
>>
>>   http://www.scylladb.com/2016/04/29/io-scheduler-2/
>>
>>
>> There's simply no way to get this kind of control when you rely on the
>> kernel for scheduling and page cache management.  As a result you have to
>> overprovision your node and then you mostly underutilize it.
>>
>> On 03/12/2017 10:23 AM, Avi Kivity wrote:
>>
>>
>>
>> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>>
>> My response is inline.
>>
>> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> There are several issues at play here.
>>>
>>> First, a database runs a large number of concurrent operations, each of
>>> which only consumes a small amount of CPU. The high concurrency is need to
>>> hide latency: disk latency, or the latency of contacting a remote node.
>>>
>>
>> *Ok so you are talking about hiding I/O latency.  If all these I/O are
>> non-blocking system calls then a thread per core and callback mechanism
>> should suffice isn't it?*
>>
>>
>>
>> Scylla uses a mix of user-level threads and callbacks. Most of the code
>> uses callbacks (fronted by a future/promise API). SSTable writers
>> (memtable flush, compaction) use a user-level thread (internally
>> implemented using callbacks).  The important bit is multiplexing many
>> concurrent operations onto a single kernel thread.
>>
>>
>> This means that the scheduler will need to switch contexts very often. A
>>> kernel thread scheduler knows very little about the application, so it has
>>> to switch a lot of context.  A user level scheduler is tightly bound to the
>>> application, so it can perform the switching faster.
>>>
>>
>> *sure but this applies in other direction as well. A user level scheduler
>> has no idea about kernel level scheduler either.  There is literally no
>> coordination between kernel level scheduler and user level scheduler in
>> linux or any major OS. It may be possible with OS's that support scheduler
>> activation(LWP's) and upcall mechanism. *
>>
>>
>> There is no need for coordination, because the kernel scheduler has no
>> scheduling decisions to make.  With one thread per core, bound to its core,
>> the kernel scheduler can't make the wrong decision because it has just one
>> choice.
>>
>>
>> *Even then it is hard to say if it is all worth it (The research shows
>> performance may not outweigh the complexity). Golang problem is exactly
>> this if one creates 1000 go routines/green threads where each of them is
>> making a blocking system call then it would create 1000 kernel threads
>> underneath because it has no way to know that the kernel thread is blocked
>> (no upcall). *
>>
>>
>> All of the significant system calls we issue are through the main thread,
>> either asynchronous or non-blocking.
>>
>> *And in non-blocking case I still don't even see a significant
>> performance when compared to few kernel threads with callback mechanism.*
>>
>>
>> We do.
>>
>>
>> *  If you are saying user level scheduling is the Future (perhaps I would
>> just let the researchers argue about it) As of today that is not case else
>> languages would have had it natively instead of using third party
>> frameworks or libraries. *
>>
>>
>> User-level scheduling is great for high performance I/O intensive
>> applications like databases and file systems.  It's not a general solution,
>> and it involves a lot of effort to set up the infrastructure. However, for
>> our use case, it was worth it.
>>
>>
>>
>>> There are also implications on the concurrency primitives in use (locks
>>> etc.) -- they will be much faster for the user-level scheduler, because
>>> they cooperate with the scheduler.  For example, no atomic
>>> read-modify-write instructions need to be executed.
>>>
>>
>>
>>      Second, how many (kernel) threads should you run?
>> * This question one will always have. If there are 10K user level threads
>> that maps to only one kernel thread then they cannot exploit parallelism.
>> so there is no right answer but a thread per core is a reasonable/good
>> choice. *
>>
>>
>> Only if you can multiplex many operations on top of each of those
>> threads.  Otherwise, the CPUs end up underutilized.
>>
>>
>>
>>> If you run too few threads, then you will not be able to saturate the
>>> CPU resources.  This is a common problem with Cassandra -- it's very hard
>>> to get it to consume all of the CPU power on even a moderately large
>>> machine. On the other hand, if you have too many threads, you will see
>>> latency rise very quickly, because kernel scheduling granularity is on the
>>> order of milliseconds.  User-level scheduling, because it leaves control in
>>> the hand of the application, allows you to both saturate the CPU and
>>> maintain low latency.
>>>
>>
>>     F*or my workload and probably others I had seen Cassandra was always
>> been CPU bound.*
>>
>>>
>>>
>>
>> Yes, but does it consume 100% of all of the cores on your machine?
>> Cassandra generally doesn't (on a larger machine), and when you profile it,
>> you see it spending much of its time in atomic operations, or
>> parking/unparking threads -- fighting with itself.  It doesn't scale within
>> the machine.  Scylla will happily utilize all of the cores that it is
>> assigned (all of them by default in most configurations), and the bigger
>> the machine you give it, the happier it will be.
>>
>> There are other factors, like NUMA-friendliness, but in the end it all
>>> boils down to efficiency and control.
>>>
>>> None of this is new btw, it's pretty common in the storage world.
>>>
>>> Avi
>>>
>>>
>>> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>>
>>> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
>>> still don't see how user level scheduling can be beneficial (This is a well
>>> debated problem)? How can this add to the performance? or say why is user
>>> level scheduling necessary Given the Thread per core design and the
>>> callback mechanism?
>>>
>>> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>>> Scylla uses a the seastar framework, which provides for both user-level
>>>> thread scheduling and simple run-to-completion tasks.
>>>>
>>>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>>>> transparent hugepages).
>>>>
>>>>
>>>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>>
>>>> @Dor
>>>>
>>>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>>>> that maps user level threads to kernel level threads? I thought C++ by
>>>> default creates native kernel threads but sure nothing will stop someone to
>>>> create a user level scheduling library if that's what you are talking about?
>>>> 2) How can one create THP of size 1KB? According to this post
>>>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>>> looks like the valid values 2MB and 1GB.
>>>>
>>>> Thanks,
>>>> kant
>>>>
>>>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>>>
>>>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see
>>>>> where there is potential, and follow through with your own tests.
>>>>>
>>>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>>
>>>>>
>>>>> Benchmarks are great for FUDly blog posts. Real world work loads
>>>>> matter more. Every NoSQL vendor wins their benchmarks.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

We already quantified it, the result is Scylla. Now, Scylla's 
performance is only in part due to the threading model, so I can't give 
you a number that quantifies how much just this aspect of the design is 
worth.  Removing it (or adding it to Cassandra) is a multi-man-year 
effort that I can't justify for this conversation.


If you want to continue to use kernel threads for you applications, by 
all means continue to do so.  They're the right choice for all but the 
most I/O intensive applications.  But for these I/O intensive 
applications thread-per-core is the right choice, regardless of the 
points you raise.


I encourage you to study the seastar code base [1] and documentation [2] 
to see how we handled those problems.  I'll also comment a bit below.


[1] https://github.com/scylladb/seastar

[2] http://www.seastar-project.org/


On 03/12/2017 11:07 AM, Kant Kodali wrote:
> @Avi
>
> "User-level scheduling is great for high performance I/O intensive 
> applications like databases and file systems." This is generally a 
> claim made by people you want to use user-level threads but I rarely 
> had seen any significant performance gain. Since you are claiming that 
> you do. It would be great if you can quantify that. The other day I 
> have seen a benchmark of a Golang server which supports user level 
> threads/green threads natively and it was able to handle 10K 
> concurrent requests. Even Nginx which is written and C and uses kernel 
> threads can handle that many with Non-blocking I/O. We all know 
> concurrency is not parallelism.
>
> You may have to pay for something which could be any of the following.
>
> *Duplication of the schedulers*
> M:N requires two schedulers which basically do same work, one at user 
> level and one in kernel. This is undesirable. It requires frequent 
> data communications between kernel and user space for scheduling 
> information transference.
>
> Duplication takes more space in both Dcache and Icache for scheduling 
> than a single scheduler. It is highly undesirable if cache misses are 
> caused by the schedulers but the application, because a L2 cache miss 
> could be more expensive than a kernel thread switch. Then the 
> additional scheduler might become a trouble maker! In this case, to 
> save kernel trappings does not justify a user-scheduler, which is more 
> truen when the processors are providing faster and faster kernel 
> trapping execution.


That's not a problem, at least in my experience. The kernel scheduler 
needs to schedule only one thread, and that very infrequently. It is 
completely out of any hot path.

>
> *Thread local data maintenance*
> M:N has to maintain thread specific data, which are already provided 
> by kernel for kernel thread, such as the TLS data, error number. To 
> provide the same feature for user threads is not straightforward, 
> because, for example, the error number is returned for system call 
> failure and supported by kernel. User-level support degrades system 
> performance and increases system complexity.

This is also not a problem, we capture error codes in exceptions 
immediately after a system call and so we don't need to rely on TLS for 
errno.

>
> *System info oblivious*
> Kernel scheduler is close to underlying platform and architecture. It 
> can take advantage of their features. This is difficult for user 
> thread library because it's a layer at user level. User threads are 
> second-order entities in the system. If a kernel thread uses a GDT 
> slot for TLS data, a user thread perhaps can only use an LDT slot for 
> TLS data. With increasingly more supports available from the new 
> processors for threading/scheduling (Hyperthreading, NUMA, many-core), 
> the second order nature seriously limits the ability of M:N threading.

Those are non-issues, in my experience.  In fact it's the other way 
around, the kernel scheduler cannot assume anything about the threads it 
is preempting and so has to save more state.  The threads being 
preempted also cannot assume anything about the kernel scheduler, and so 
have to use atomic read-modify-write instructions for synchronization, 
and to perform a system call whenever they need to block or wake another 
thread.



>
> On Sun, Mar 12, 2017 at 1:05 AM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     btw, for an example of how user-level tasks can be scheduled in a
>     way that cannot be done with kernel threads, see this pair of blog
>     posts:
>
>
>     http://www.scylladb.com/2016/04/14/io-scheduler-1/
>     <http://www.scylladb.com/2016/04/14/io-scheduler-1/>
>
>     http://www.scylladb.com/2016/04/29/io-scheduler-2/
>     <http://www.scylladb.com/2016/04/29/io-scheduler-2/>
>
>
>     There's simply no way to get this kind of control when you rely on
>     the kernel for scheduling and page cache management.  As a result
>     you have to overprovision your node and then you mostly
>     underutilize it.
>
>
>     On 03/12/2017 10:23 AM, Avi Kivity wrote:
>>
>>
>>
>>     On 03/12/2017 12:19 AM, Kant Kodali wrote:
>>>     My response is inline.
>>>
>>>     On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <avi@scylladb.com
>>>     <ma...@scylladb.com>> wrote:
>>>
>>>         There are several issues at play here.
>>>
>>>         First, a database runs a large number of concurrent
>>>         operations, each of which only consumes a small amount of
>>>         CPU. The high concurrency is need to hide latency: disk
>>>         latency, or the latency of contacting a remote node.
>>>
>>>     *Ok so you are talking about hiding I/O latency.  If all these
>>>     I/O are non-blocking system calls then a thread per core and
>>>     callback mechanism should suffice isn't it?*
>>
>>     Scylla uses a mix of user-level threads and callbacks. Most of
>>     the code uses callbacks (fronted by a future/promise API).
>>     SSTable writers  (memtable flush, compaction) use a user-level
>>     thread (internally implemented using callbacks).  The important
>>     bit is multiplexing many concurrent operations onto a single
>>     kernel thread.
>>
>>
>>>         This means that the scheduler will need to switch contexts
>>>         very often. A kernel thread scheduler knows very little
>>>         about the application, so it has to switch a lot of
>>>         context.  A user level scheduler is tightly bound to the
>>>         application, so it can perform the switching faster.
>>>
>>>
>>>     *sure but this applies in other direction as well. A user level
>>>     scheduler has no idea about kernel level scheduler either. 
>>>     There is literally no coordination between kernel level
>>>     scheduler and user level scheduler in linux or any major OS. It
>>>     may be possible with OS's that support scheduler
>>>     activation(LWP's) and upcall mechanism. *
>>
>>     There is no need for coordination, because the kernel scheduler
>>     has no scheduling decisions to make.  With one thread per core,
>>     bound to its core, the kernel scheduler can't make the wrong
>>     decision because it has just one choice.
>>
>>
>>>     *Even then it is hard to say if it is all worth it (The research
>>>     shows performance may not outweigh the complexity). Golang
>>>     problem is exactly this if one creates 1000 go routines/green
>>>     threads where each of them is making a blocking system call then
>>>     it would create 1000 kernel threads underneath because it has no
>>>     way to know that the kernel thread is blocked (no upcall). *
>>
>>     All of the significant system calls we issue are through the main
>>     thread, either asynchronous or non-blocking.
>>
>>>     *And in non-blocking case I still don't even see a significant
>>>     performance when compared to few kernel threads with callback
>>>     mechanism.*
>>
>>     We do.
>>
>>>     *  If you are saying user level scheduling is the Future
>>>     (perhaps I would just let the researchers argue about it) As of
>>>     today that is not case else languages would have had it natively
>>>     instead of using third party frameworks or libraries.
>>>     *
>>
>>     User-level scheduling is great for high performance I/O intensive
>>     applications like databases and file systems.  It's not a general
>>     solution, and it involves a lot of effort to set up the
>>     infrastructure. However, for our use case, it was worth it.
>>
>>>         There are also implications on the concurrency primitives in
>>>         use (locks etc.) -- they will be much faster for the
>>>         user-level scheduler, because they cooperate with the
>>>         scheduler.  For example, no atomic read-modify-write
>>>         instructions need to be executed.
>>>
>>>
>>>          Second, how many (kernel) threads should you run?*This
>>>     question one will always have. If there are 10K user level
>>>     threads that maps to only one kernel thread then they cannot
>>>     exploit parallelism. so there is no right answer but a thread
>>>     per core is a reasonable/good choice.
>>>     *
>>
>>     Only if you can multiplex many operations on top of each of those
>>     threads.  Otherwise, the CPUs end up underutilized.
>>
>>>         If you run too few threads, then you will not be able to
>>>         saturate the CPU resources.  This is a common problem with
>>>         Cassandra -- it's very hard to get it to consume all of the
>>>         CPU power on even a moderately large machine. On the other
>>>         hand, if you have too many threads, you will see latency
>>>         rise very quickly, because kernel scheduling granularity is
>>>         on the order of milliseconds.  User-level scheduling,
>>>         because it leaves control in the hand of the application,
>>>         allows you to both saturate the CPU and maintain low latency.
>>>
>>>
>>>         F*or my workload and probably others I had seen Cassandra
>>>     was always been CPU bound.*
>>>
>>>
>>
>>
>>     Yes, but does it consume 100% of all of the cores on your
>>     machine?  Cassandra generally doesn't (on a larger machine), and
>>     when you profile it, you see it spending much of its time in
>>     atomic operations, or parking/unparking threads -- fighting with
>>     itself. It doesn't scale within the machine.  Scylla will happily
>>     utilize all of the cores that it is assigned (all of them by
>>     default in most configurations), and the bigger the machine you
>>     give it, the happier it will be.
>>
>>>         There are other factors, like NUMA-friendliness, but in the
>>>         end it all boils down to efficiency and control.
>>>
>>>         None of this is new btw, it's pretty common in the storage
>>>         world.
>>>
>>>         Avi
>>>
>>>
>>>         On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>>>         Here is the Java version
>>>>         http://docs.paralleluniverse.co/quasar/
>>>>         <http://docs.paralleluniverse.co/quasar/> but I still don't
>>>>         see how user level scheduling can be beneficial (This is a
>>>>         well debated problem)? How can this add to the performance?
>>>>         or say why is user level scheduling necessary Given the
>>>>         Thread per core design and the callback mechanism?
>>>>
>>>>         On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity
>>>>         <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>>
>>>>             Scylla uses a the seastar framework, which provides for
>>>>             both user-level thread scheduling and simple
>>>>             run-to-completion tasks.
>>>>
>>>>             Huge pages are limited to 2MB (and 1GB, but these
>>>>             aren't available as transparent hugepages).
>>>>
>>>>
>>>>             On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>>>             @Dor
>>>>>
>>>>>             1) You guys have a CPU scheduler? you mean user level
>>>>>             thread Scheduler that maps user level threads to
>>>>>             kernel level threads? I thought C++ by default creates
>>>>>             native kernel threads but sure nothing will stop
>>>>>             someone to create a user level scheduling library if
>>>>>             that's what you are talking about?
>>>>>             2) How can one create THP of size 1KB? According to
>>>>>             this post
>>>>>             <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>>>>             looks like the valid values 2MB and 1GB.
>>>>>
>>>>>             Thanks,
>>>>>             kant
>>>>>
>>>>>             On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity
>>>>>             <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>>>
>>>>>                 Agreed, I'd recommend to treat benchmarks as a
>>>>>                 rough guide to see where there is potential, and
>>>>>                 follow through with your own tests.
>>>>>
>>>>>                 On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>>>
>>>>>>                 Benchmarks are great for FUDly blog posts. Real
>>>>>>                 world work loads matter more. Every NoSQL vendor
>>>>>>                 wins their benchmarks.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

@Avi

"User-level scheduling is great for high performance I/O intensive
applications like databases and file systems." This is generally a claim
made by people you want to use user-level threads but I rarely had seen any
significant performance gain. Since you are claiming that you do. It would
be great if you can quantify that. The other day I have seen a benchmark of
a Golang server which supports user level threads/green threads natively
and it was able to handle 10K concurrent requests. Even Nginx which is
written and C and uses kernel threads can handle that many with
Non-blocking I/O. We all know concurrency is not parallelism.

You may have to pay for something which could be any of the following.

*Duplication of the schedulers*
M:N requires two schedulers which basically do same work, one at user level
and one in kernel. This is undesirable. It requires frequent data
communications between kernel and user space for scheduling information
transference.

Duplication takes more space in both Dcache and Icache for scheduling than
a single scheduler. It is highly undesirable if cache misses are caused by
the schedulers but the application, because a L2 cache miss could be more
expensive than a kernel thread switch. Then the additional scheduler might
become a trouble maker! In this case, to save kernel trappings does not
justify a user-scheduler, which is more truen when the processors are
providing faster and faster kernel trapping execution.

*Thread local data maintenance*
M:N has to maintain thread specific data, which are already provided by
kernel for kernel thread, such as the TLS data, error number. To provide
the same feature for user threads is not straightforward, because, for
example, the error number is returned for system call failure and supported
by kernel. User-level support degrades system performance and increases
system complexity.

*System info oblivious*
Kernel scheduler is close to underlying platform and architecture. It can
take advantage of their features. This is difficult for user thread library
because it's a layer at user level. User threads are second-order entities
in the system. If a kernel thread uses a GDT slot for TLS data, a user
thread perhaps can only use an LDT slot for TLS data. With increasingly
more supports available from the new processors for threading/scheduling
(Hyperthreading, NUMA, many-core), the second order nature seriously limits
the ability of M:N threading.

On Sun, Mar 12, 2017 at 1:05 AM, Avi Kivity <av...@scylladb.com> wrote:

> btw, for an example of how user-level tasks can be scheduled in a way that
> cannot be done with kernel threads, see this pair of blog posts:
>
>
>   http://www.scylladb.com/2016/04/14/io-scheduler-1/
>
>   http://www.scylladb.com/2016/04/29/io-scheduler-2/
>
>
> There's simply no way to get this kind of control when you rely on the
> kernel for scheduling and page cache management.  As a result you have to
> overprovision your node and then you mostly underutilize it.
>
> On 03/12/2017 10:23 AM, Avi Kivity wrote:
>
>
>
> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of CPU. The high concurrency is need to
>> hide latency: disk latency, or the latency of contacting a remote node.
>>
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are
> non-blocking system calls then a thread per core and callback mechanism
> should suffice isn't it?*
>
>
>
> Scylla uses a mix of user-level threads and callbacks. Most of the code
> uses callbacks (fronted by a future/promise API). SSTable writers
> (memtable flush, compaction) use a user-level thread (internally
> implemented using callbacks).  The important bit is multiplexing many
> concurrent operations onto a single kernel thread.
>
>
> This means that the scheduler will need to switch contexts very often. A
>> kernel thread scheduler knows very little about the application, so it has
>> to switch a lot of context.  A user level scheduler is tightly bound to the
>> application, so it can perform the switching faster.
>>
>
> *sure but this applies in other direction as well. A user level scheduler
> has no idea about kernel level scheduler either.  There is literally no
> coordination between kernel level scheduler and user level scheduler in
> linux or any major OS. It may be possible with OS's that support scheduler
> activation(LWP's) and upcall mechanism. *
>
>
> There is no need for coordination, because the kernel scheduler has no
> scheduling decisions to make.  With one thread per core, bound to its core,
> the kernel scheduler can't make the wrong decision because it has just one
> choice.
>
>
> *Even then it is hard to say if it is all worth it (The research shows
> performance may not outweigh the complexity). Golang problem is exactly
> this if one creates 1000 go routines/green threads where each of them is
> making a blocking system call then it would create 1000 kernel threads
> underneath because it has no way to know that the kernel thread is blocked
> (no upcall). *
>
>
> All of the significant system calls we issue are through the main thread,
> either asynchronous or non-blocking.
>
> *And in non-blocking case I still don't even see a significant performance
> when compared to few kernel threads with callback mechanism.*
>
>
> We do.
>
>
> *  If you are saying user level scheduling is the Future (perhaps I would
> just let the researchers argue about it) As of today that is not case else
> languages would have had it natively instead of using third party
> frameworks or libraries. *
>
>
> User-level scheduling is great for high performance I/O intensive
> applications like databases and file systems.  It's not a general solution,
> and it involves a lot of effort to set up the infrastructure. However, for
> our use case, it was worth it.
>
>
>
>> There are also implications on the concurrency primitives in use (locks
>> etc.) -- they will be much faster for the user-level scheduler, because
>> they cooperate with the scheduler.  For example, no atomic
>> read-modify-write instructions need to be executed.
>>
>
>
>      Second, how many (kernel) threads should you run?
> * This question one will always have. If there are 10K user level threads
> that maps to only one kernel thread then they cannot exploit parallelism.
> so there is no right answer but a thread per core is a reasonable/good
> choice. *
>
>
> Only if you can multiplex many operations on top of each of those
> threads.  Otherwise, the CPUs end up underutilized.
>
>
>
>> If you run too few threads, then you will not be able to saturate the CPU
>> resources.  This is a common problem with Cassandra -- it's very hard to
>> get it to consume all of the CPU power on even a moderately large machine.
>> On the other hand, if you have too many threads, you will see latency rise
>> very quickly, because kernel scheduling granularity is on the order of
>> milliseconds.  User-level scheduling, because it leaves control in the hand
>> of the application, allows you to both saturate the CPU and maintain low
>> latency.
>>
>
>     F*or my workload and probably others I had seen Cassandra was always
> been CPU bound.*
>
>>
>>
>
> Yes, but does it consume 100% of all of the cores on your machine?
> Cassandra generally doesn't (on a larger machine), and when you profile it,
> you see it spending much of its time in atomic operations, or
> parking/unparking threads -- fighting with itself.  It doesn't scale within
> the machine.  Scylla will happily utilize all of the cores that it is
> assigned (all of them by default in most configurations), and the bigger
> the machine you give it, the happier it will be.
>
> There are other factors, like NUMA-friendliness, but in the end it all
>> boils down to efficiency and control.
>>
>> None of this is new btw, it's pretty common in the storage world.
>>
>> Avi
>>
>>
>> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>
>> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
>> still don't see how user level scheduling can be beneficial (This is a well
>> debated problem)? How can this add to the performance? or say why is user
>> level scheduling necessary Given the Thread per core design and the
>> callback mechanism?
>>
>> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> Scylla uses a the seastar framework, which provides for both user-level
>>> thread scheduling and simple run-to-completion tasks.
>>>
>>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>>> transparent hugepages).
>>>
>>>
>>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>
>>> @Dor
>>>
>>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>>> that maps user level threads to kernel level threads? I thought C++ by
>>> default creates native kernel threads but sure nothing will stop someone to
>>> create a user level scheduling library if that's what you are talking about?
>>> 2) How can one create THP of size 1KB? According to this post
>>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>> looks like the valid values 2MB and 1GB.
>>>
>>> Thanks,
>>> kant
>>>
>>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>>>> there is potential, and follow through with your own tests.
>>>>
>>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>
>>>>
>>>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>>>> more. Every NoSQL vendor wins their benchmarks.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

btw, for an example of how user-level tasks can be scheduled in a way 
that cannot be done with kernel threads, see this pair of blog posts:


   http://www.scylladb.com/2016/04/14/io-scheduler-1/

   http://www.scylladb.com/2016/04/29/io-scheduler-2/


There's simply no way to get this kind of control when you rely on the 
kernel for scheduling and page cache management.  As a result you have 
to overprovision your node and then you mostly underutilize it.


On 03/12/2017 10:23 AM, Avi Kivity wrote:
>
>
>
> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>> My response is inline.
>>
>> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <avi@scylladb.com 
>> <ma...@scylladb.com>> wrote:
>>
>>     There are several issues at play here.
>>
>>     First, a database runs a large number of concurrent operations,
>>     each of which only consumes a small amount of CPU. The high
>>     concurrency is need to hide latency: disk latency, or the latency
>>     of contacting a remote node.
>>
>> *Ok so you are talking about hiding I/O latency. If all these I/O are 
>> non-blocking system calls then a thread per core and callback 
>> mechanism should suffice isn't it?*
>
> Scylla uses a mix of user-level threads and callbacks. Most of the 
> code uses callbacks (fronted by a future/promise API). SSTable 
> writers  (memtable flush, compaction) use a user-level thread 
> (internally implemented using callbacks).  The important bit is 
> multiplexing many concurrent operations onto a single kernel thread.
>
>
>>     This means that the scheduler will need to switch contexts very
>>     often. A kernel thread scheduler knows very little about the
>>     application, so it has to switch a lot of context.  A user level
>>     scheduler is tightly bound to the application, so it can perform
>>     the switching faster.
>>
>>
>> *sure but this applies in other direction as well. A user level 
>> scheduler has no idea about kernel level scheduler either.  There is 
>> literally no coordination between kernel level scheduler and user 
>> level scheduler in linux or any major OS. It may be possible with 
>> OS's that support scheduler activation(LWP's) and upcall mechanism. *
>
> There is no need for coordination, because the kernel scheduler has no 
> scheduling decisions to make.  With one thread per core, bound to its 
> core, the kernel scheduler can't make the wrong decision because it 
> has just one choice.
>
>
>> *Even then it is hard to say if it is all worth it (The research 
>> shows performance may not outweigh the complexity). Golang problem is 
>> exactly this if one creates 1000 go routines/green threads where each 
>> of them is making a blocking system call then it would create 1000 
>> kernel threads underneath because it has no way to know that the 
>> kernel thread is blocked (no upcall). *
>
> All of the significant system calls we issue are through the main 
> thread, either asynchronous or non-blocking.
>
>> *And in non-blocking case I still don't even see a significant 
>> performance when compared to few kernel threads with callback mechanism.*
>
> We do.
>
>> *  If you are saying user level scheduling is the Future (perhaps I 
>> would just let the researchers argue about it) As of today that is 
>> not case else languages would have had it natively instead of using 
>> third party frameworks or libraries.
>> *
>
> User-level scheduling is great for high performance I/O intensive 
> applications like databases and file systems.  It's not a general 
> solution, and it involves a lot of effort to set up the 
> infrastructure. However, for our use case, it was worth it.
>
>>     There are also implications on the concurrency primitives in use
>>     (locks etc.) -- they will be much faster for the user-level
>>     scheduler, because they cooperate with the scheduler.  For
>>     example, no atomic read-modify-write instructions need to be
>>     executed.
>>
>>
>>      Second, how many (kernel) threads should you run?*This question 
>> one will always have. If there are 10K user level threads that maps 
>> to only one kernel thread then they cannot exploit parallelism. so 
>> there is no right answer but a thread per core is a reasonable/good 
>> choice.
>> *
>
> Only if you can multiplex many operations on top of each of those 
> threads.  Otherwise, the CPUs end up underutilized.
>
>>     If you run too few threads, then you will not be able to saturate
>>     the CPU resources.  This is a common problem with Cassandra --
>>     it's very hard to get it to consume all of the CPU power on even
>>     a moderately large machine. On the other hand, if you have too
>>     many threads, you will see latency rise very quickly, because
>>     kernel scheduling granularity is on the order of milliseconds. 
>>     User-level scheduling, because it leaves control in the hand of
>>     the application, allows you to both saturate the CPU and maintain
>>     low latency.
>>
>>
>>     F*or my workload and probably others I had seen Cassandra was 
>> always been CPU bound.*
>>
>>
>
>
> Yes, but does it consume 100% of all of the cores on your machine?  
> Cassandra generally doesn't (on a larger machine), and when you 
> profile it, you see it spending much of its time in atomic operations, 
> or parking/unparking threads -- fighting with itself.  It doesn't 
> scale within the machine.  Scylla will happily utilize all of the 
> cores that it is assigned (all of them by default in most 
> configurations), and the bigger the machine you give it, the happier 
> it will be.
>
>>     There are other factors, like NUMA-friendliness, but in the end
>>     it all boils down to efficiency and control.
>>
>>     None of this is new btw, it's pretty common in the storage world.
>>
>>     Avi
>>
>>
>>     On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>>     Here is the Java version http://docs.paralleluniverse.co/quasar/
>>>     <http://docs.paralleluniverse.co/quasar/> but I still don't see
>>>     how user level scheduling can be beneficial (This is a well
>>>     debated problem)? How can this add to the performance? or say
>>>     why is user level scheduling necessary Given the Thread per core
>>>     design and the callback mechanism?
>>>
>>>     On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <avi@scylladb.com
>>>     <ma...@scylladb.com>> wrote:
>>>
>>>         Scylla uses a the seastar framework, which provides for both
>>>         user-level thread scheduling and simple run-to-completion tasks.
>>>
>>>         Huge pages are limited to 2MB (and 1GB, but these aren't
>>>         available as transparent hugepages).
>>>
>>>
>>>         On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>>         @Dor
>>>>
>>>>         1) You guys have a CPU scheduler? you mean user level
>>>>         thread Scheduler that maps user level threads to kernel
>>>>         level threads? I thought C++ by default creates native
>>>>         kernel threads but sure nothing will stop someone to create
>>>>         a user level scheduling library if that's what you are
>>>>         talking about?
>>>>         2) How can one create THP of size 1KB? According to this
>>>>         post
>>>>         <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>>>         looks like the valid values 2MB and 1GB.
>>>>
>>>>         Thanks,
>>>>         kant
>>>>
>>>>         On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity
>>>>         <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>>
>>>>             Agreed, I'd recommend to treat benchmarks as a rough
>>>>             guide to see where there is potential, and follow
>>>>             through with your own tests.
>>>>
>>>>             On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>>
>>>>>             Benchmarks are great for FUDly blog posts. Real world
>>>>>             work loads matter more. Every NoSQL vendor wins their
>>>>>             benchmarks.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

On Sun, Mar 12, 2017 at 12:23 AM, Avi Kivity <av...@scylladb.com> wrote:

>
>
> On 03/12/2017 12:19 AM, Kant Kodali wrote:
>
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> There are several issues at play here.
>>
>> First, a database runs a large number of concurrent operations, each of
>> which only consumes a small amount of CPU. The high concurrency is need to
>> hide latency: disk latency, or the latency of contacting a remote node.
>>
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are
> non-blocking system calls then a thread per core and callback mechanism
> should suffice isn't it?*
>
>
>
> Scylla uses a mix of user-level threads and callbacks. Most of the code
> uses callbacks (fronted by a future/promise API). SSTable writers
> (memtable flush, compaction) use a user-level thread (internally
> implemented using callbacks).  The important bit is multiplexing many
> concurrent operations onto a single kernel thread.
>
>
> This means that the scheduler will need to switch contexts very often. A
>> kernel thread scheduler knows very little about the application, so it has
>> to switch a lot of context.  A user level scheduler is tightly bound to the
>> application, so it can perform the switching faster.
>>
>
> *sure but this applies in other direction as well. A user level scheduler
> has no idea about kernel level scheduler either.  There is literally no
> coordination between kernel level scheduler and user level scheduler in
> linux or any major OS. It may be possible with OS's that support scheduler
> activation(LWP's) and upcall mechanism. *
>
>
> There is no need for coordination, because the kernel scheduler has no
> scheduling decisions to make.  With one thread per core, bound to its core,
> the kernel scheduler can't make the wrong decision because it has just one
> choice.
>
>
> *Even then it is hard to say if it is all worth it (The research shows
> performance may not outweigh the complexity). Golang problem is exactly
> this if one creates 1000 go routines/green threads where each of them is
> making a blocking system call then it would create 1000 kernel threads
> underneath because it has no way to know that the kernel thread is blocked
> (no upcall). *
>
>
> All of the significant system calls we issue are through the main thread,
> either asynchronous or non-blocking.
>
> *And in non-blocking case I still don't even see a significant performance
> when compared to few kernel threads with callback mechanism.*
>
>
> We do.
>
>
> *  If you are saying user level scheduling is the Future (perhaps I would
> just let the researchers argue about it) As of today that is not case else
> languages would have had it natively instead of using third party
> frameworks or libraries. *
>
>
> User-level scheduling is great for high performance I/O intensive
> applications like databases and file systems.  It's not a general solution,
> and it involves a lot of effort to set up the infrastructure. However, for
> our use case, it was worth it.
>

    *Even with I/O intensive applications it is very much debatable. The
numbers I had seen aren't convincing at all. *

>
>
>
>
>> There are also implications on the concurrency primitives in use (locks
>> etc.) -- they will be much faster for the user-level scheduler, because
>> they cooperate with the scheduler.  For example, no atomic
>> read-modify-write instructions need to be executed.
>>
>
>
>      Second, how many (kernel) threads should you run?
> * This question one will always have. If there are 10K user level threads
> that maps to only one kernel thread then they cannot exploit parallelism.
> so there is no right answer but a thread per core is a reasonable/good
> choice. *
>
>
> Only if you can multiplex many operations on top of each of those
> threads.  Otherwise, the CPUs end up underutilized.
>

*Yes thats exactly my point to your question on "how many (kernel) threads
should you run?" so I will repeat myself here.  This question one will
always have even they prefer user-level thread scheduling they still need
to know how may kernel threads they need to map to so one will end up with
same question which is how many kernel threads to create?. If there are 10K
user level threads that maps to only one kernel thread then they cannot
exploit parallelism. so there is no right answer but a thread per core is a
reasonable/good choice. *


>
>
>
>
>> If you run too few threads, then you will not be able to saturate the CPU
>> resources.  This is a common problem with Cassandra -- it's very hard to
>> get it to consume all of the CPU power on even a moderately large machine.
>> On the other hand, if you have too many threads, you will see latency rise
>> very quickly, because kernel scheduling granularity is on the order of
>> milliseconds.  User-level scheduling, because it leaves control in the hand
>> of the application, allows you to both saturate the CPU and maintain low
>> latency.
>>
>
>     F*or my workload and probably others I had seen Cassandra was always
> been CPU bound.*
>
>>
>>
>
> Yes, but does it consume 100% of all of the cores on your machine?
> Cassandra generally doesn't (on a larger machine), and when you profile it,
> you see it spending much of its time in atomic operations, or
> parking/unparking threads -- fighting with itself.  It doesn't scale within
> the machine.  Scylla will happily utilize all of the cores that it is
> assigned (all of them by default in most configurations), and the bigger
> the machine you give it, the happier it will be.
>

   * In my case all our writes are LWT's and I ran it on c3.4xlarge it was
able to saturate all 16 cores. I can also send you screenshot of top if
needed.*

>
>
> There are other factors, like NUMA-friendliness, but in the end it all
>> boils down to efficiency and control.
>>
>> None of this is new btw, it's pretty common in the storage world.
>>
>> Avi
>>
>>
>> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>
>> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
>> still don't see how user level scheduling can be beneficial (This is a well
>> debated problem)? How can this add to the performance? or say why is user
>> level scheduling necessary Given the Thread per core design and the
>> callback mechanism?
>>
>> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> Scylla uses a the seastar framework, which provides for both user-level
>>> thread scheduling and simple run-to-completion tasks.
>>>
>>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>>> transparent hugepages).
>>>
>>>
>>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>
>>> @Dor
>>>
>>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>>> that maps user level threads to kernel level threads? I thought C++ by
>>> default creates native kernel threads but sure nothing will stop someone to
>>> create a user level scheduling library if that's what you are talking about?
>>> 2) How can one create THP of size 1KB? According to this post
>>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>> looks like the valid values 2MB and 1GB.
>>>
>>> Thanks,
>>> kant
>>>
>>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>>>> there is potential, and follow through with your own tests.
>>>>
>>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>
>>>>
>>>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>>>> more. Every NoSQL vendor wins their benchmarks.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.


On 03/12/2017 12:19 AM, Kant Kodali wrote:
> My response is inline.
>
> On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     There are several issues at play here.
>
>     First, a database runs a large number of concurrent operations,
>     each of which only consumes a small amount of CPU. The high
>     concurrency is need to hide latency: disk latency, or the latency
>     of contacting a remote node.
>
> *Ok so you are talking about hiding I/O latency.  If all these I/O are 
> non-blocking system calls then a thread per core and callback 
> mechanism should suffice isn't it?*

Scylla uses a mix of user-level threads and callbacks. Most of the code 
uses callbacks (fronted by a future/promise API). SSTable writers  
(memtable flush, compaction) use a user-level thread (internally 
implemented using callbacks).  The important bit is multiplexing many 
concurrent operations onto a single kernel thread.


>     This means that the scheduler will need to switch contexts very
>     often. A kernel thread scheduler knows very little about the
>     application, so it has to switch a lot of context.  A user level
>     scheduler is tightly bound to the application, so it can perform
>     the switching faster.
>
>
> *sure but this applies in other direction as well. A user level 
> scheduler has no idea about kernel level scheduler either.  There is 
> literally no coordination between kernel level scheduler and user 
> level scheduler in linux or any major OS. It may be possible with OS's 
> that support scheduler activation(LWP's) and upcall mechanism. *

There is no need for coordination, because the kernel scheduler has no 
scheduling decisions to make.  With one thread per core, bound to its 
core, the kernel scheduler can't make the wrong decision because it has 
just one choice.


> *Even then it is hard to say if it is all worth it (The research shows 
> performance may not outweigh the complexity). Golang problem is 
> exactly this if one creates 1000 go routines/green threads where each 
> of them is making a blocking system call then it would create 1000 
> kernel threads underneath because it has no way to know that the 
> kernel thread is blocked (no upcall). *

All of the significant system calls we issue are through the main 
thread, either asynchronous or non-blocking.

> *And in non-blocking case I still don't even see a significant 
> performance when compared to few kernel threads with callback mechanism.*

We do.

> *  If you are saying user level scheduling is the Future (perhaps I 
> would just let the researchers argue about it) As of today that is not 
> case else languages would have had it natively instead of using third 
> party frameworks or libraries.
> *

User-level scheduling is great for high performance I/O intensive 
applications like databases and file systems.  It's not a general 
solution, and it involves a lot of effort to set up the infrastructure. 
However, for our use case, it was worth it.

>     There are also implications on the concurrency primitives in use
>     (locks etc.) -- they will be much faster for the user-level
>     scheduler, because they cooperate with the scheduler.  For
>     example, no atomic read-modify-write instructions need to be executed.
>
>
>      Second, how many (kernel) threads should you run?*This question 
> one will always have. If there are 10K user level threads that maps to 
> only one kernel thread then they cannot exploit parallelism. so there 
> is no right answer but a thread per core is a reasonable/good choice.
> *

Only if you can multiplex many operations on top of each of those 
threads.  Otherwise, the CPUs end up underutilized.

>     If you run too few threads, then you will not be able to saturate
>     the CPU resources.  This is a common problem with Cassandra --
>     it's very hard to get it to consume all of the CPU power on even a
>     moderately large machine. On the other hand, if you have too many
>     threads, you will see latency rise very quickly, because kernel
>     scheduling granularity is on the order of milliseconds. 
>     User-level scheduling, because it leaves control in the hand of
>     the application, allows you to both saturate the CPU and maintain
>     low latency.
>
>
>     F*or my workload and probably others I had seen Cassandra was 
> always been CPU bound.*
>
>


Yes, but does it consume 100% of all of the cores on your machine? 
Cassandra generally doesn't (on a larger machine), and when you profile 
it, you see it spending much of its time in atomic operations, or 
parking/unparking threads -- fighting with itself. It doesn't scale 
within the machine.  Scylla will happily utilize all of the cores that 
it is assigned (all of them by default in most configurations), and the 
bigger the machine you give it, the happier it will be.

>     There are other factors, like NUMA-friendliness, but in the end it
>     all boils down to efficiency and control.
>
>     None of this is new btw, it's pretty common in the storage world.
>
>     Avi
>
>
>     On 03/11/2017 11:18 PM, Kant Kodali wrote:
>>     Here is the Java version http://docs.paralleluniverse.co/quasar/
>>     <http://docs.paralleluniverse.co/quasar/> but I still don't see
>>     how user level scheduling can be beneficial (This is a well
>>     debated problem)? How can this add to the performance? or say why
>>     is user level scheduling necessary Given the Thread per core
>>     design and the callback mechanism?
>>
>>     On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <avi@scylladb.com
>>     <ma...@scylladb.com>> wrote:
>>
>>         Scylla uses a the seastar framework, which provides for both
>>         user-level thread scheduling and simple run-to-completion tasks.
>>
>>         Huge pages are limited to 2MB (and 1GB, but these aren't
>>         available as transparent hugepages).
>>
>>
>>         On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>>         @Dor
>>>
>>>         1) You guys have a CPU scheduler? you mean user level thread
>>>         Scheduler that maps user level threads to kernel level
>>>         threads? I thought C++ by default creates native kernel
>>>         threads but sure nothing will stop someone to create a user
>>>         level scheduling library if that's what you are talking about?
>>>         2) How can one create THP of size 1KB? According to this
>>>         post
>>>         <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>>         looks like the valid values 2MB and 1GB.
>>>
>>>         Thanks,
>>>         kant
>>>
>>>         On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity
>>>         <avi@scylladb.com <ma...@scylladb.com>> wrote:
>>>
>>>             Agreed, I'd recommend to treat benchmarks as a rough
>>>             guide to see where there is potential, and follow
>>>             through with your own tests.
>>>
>>>             On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>>
>>>>             Benchmarks are great for FUDly blog posts. Real world
>>>>             work loads matter more. Every NoSQL vendor wins their
>>>>             benchmarks.
>>>
>>>
>>>
>>
>>
>>
>>
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

My response is inline.

On Sat, Mar 11, 2017 at 1:43 PM, Avi Kivity <av...@scylladb.com> wrote:

> There are several issues at play here.
>
> First, a database runs a large number of concurrent operations, each of
> which only consumes a small amount of CPU. The high concurrency is need to
> hide latency: disk latency, or the latency of contacting a remote node.
>

*Ok so you are talking about hiding I/O latency.  If all these I/O are
non-blocking system calls then a thread per core and callback mechanism
should suffice isn't it?*


> This means that the scheduler will need to switch contexts very often. A
> kernel thread scheduler knows very little about the application, so it has
> to switch a lot of context.  A user level scheduler is tightly bound to the
> application, so it can perform the switching faster.
>

*sure but this applies in other direction as well. A user level scheduler
has no idea about kernel level scheduler either.  There is literally no
coordination between kernel level scheduler and user level scheduler in
linux or any major OS. It may be possible with OS's that support scheduler
activation(LWP's) and upcall mechanism. Even then it is hard to say if it
is all worth it (The research shows performance may not outweigh the
complexity). Golang problem is exactly this if one creates 1000 go
routines/green threads where each of them is making a blocking system call
then it would create 1000 kernel threads underneath because it has no way
to know that the kernel thread is blocked (no upcall). And in non-blocking
case I still don't even see a significant performance when compared to few
kernel threads with callback mechanism.  If you are saying user level
scheduling is the Future (perhaps I would just let the researchers argue
about it) As of today that is not case else languages would have had it
natively instead of using third party frameworks or libraries. *


> There are also implications on the concurrency primitives in use (locks
> etc.) -- they will be much faster for the user-level scheduler, because
> they cooperate with the scheduler.  For example, no atomic
> read-modify-write instructions need to be executed.
>


     Second, how many (kernel) threads should you run?* This question one
will always have. If there are 10K user level threads that maps to only one
kernel thread then they cannot exploit parallelism. so there is no right
answer but a thread per core is a reasonable/good choice. *


> If you run too few threads, then you will not be able to saturate the CPU
> resources.  This is a common problem with Cassandra -- it's very hard to
> get it to consume all of the CPU power on even a moderately large machine.
> On the other hand, if you have too many threads, you will see latency rise
> very quickly, because kernel scheduling granularity is on the order of
> milliseconds.  User-level scheduling, because it leaves control in the hand
> of the application, allows you to both saturate the CPU and maintain low
> latency.
>

    F*or my workload and probably others I had seen Cassandra was always
been CPU bound.*

>
> There are other factors, like NUMA-friendliness, but in the end it all
> boils down to efficiency and control.
>
> None of this is new btw, it's pretty common in the storage world.
>
> Avi
>
>
> On 03/11/2017 11:18 PM, Kant Kodali wrote:
>
> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
> still don't see how user level scheduling can be beneficial (This is a well
> debated problem)? How can this add to the performance? or say why is user
> level scheduling necessary Given the Thread per core design and the
> callback mechanism?
>
> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> Scylla uses a the seastar framework, which provides for both user-level
>> thread scheduling and simple run-to-completion tasks.
>>
>> Huge pages are limited to 2MB (and 1GB, but these aren't available as
>> transparent hugepages).
>>
>>
>> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>
>> @Dor
>>
>> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
>> that maps user level threads to kernel level threads? I thought C++ by
>> default creates native kernel threads but sure nothing will stop someone to
>> create a user level scheduling library if that's what you are talking about?
>> 2) How can one create THP of size 1KB? According to this post
>> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>> looks like the valid values 2MB and 1GB.
>>
>> Thanks,
>> kant
>>
>> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>>> there is potential, and follow through with your own tests.
>>>
>>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>
>>>
>>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>>> more. Every NoSQL vendor wins their benchmarks.
>>>
>>>
>>>
>>
>>
>>
>>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

There are several issues at play here.

First, a database runs a large number of concurrent operations, each of 
which only consumes a small amount of CPU. The high concurrency is need 
to hide latency: disk latency, or the latency of contacting a remote 
node. This means that the scheduler will need to switch contexts very 
often. A kernel thread scheduler knows very little about the 
application, so it has to switch a lot of context.  A user level 
scheduler is tightly bound to the application, so it can perform the 
switching faster.  There are also implications on the concurrency 
primitives in use (locks etc.) -- they will be much faster for the 
user-level scheduler, because they cooperate with the scheduler.  For 
example, no atomic read-modify-write instructions need to be executed.

Second, how many (kernel) threads should you run?  If you run too few 
threads, then you will not be able to saturate the CPU resources.  This 
is a common problem with Cassandra -- it's very hard to get it to 
consume all of the CPU power on even a moderately large machine.  On the 
other hand, if you have too many threads, you will see latency rise very 
quickly, because kernel scheduling granularity is on the order of 
milliseconds. User-level scheduling, because it leaves control in the 
hand of the application, allows you to both saturate the CPU and 
maintain low latency.

There are other factors, like NUMA-friendliness, but in the end it all 
boils down to efficiency and control.

None of this is new btw, it's pretty common in the storage world.

Avi

On 03/11/2017 11:18 PM, Kant Kodali wrote:
> Here is the Java version http://docs.paralleluniverse.co/quasar/ but I 
> still don't see how user level scheduling can be beneficial (This is a 
> well debated problem)? How can this add to the performance? or say why 
> is user level scheduling necessary Given the Thread per core design 
> and the callback mechanism?
>
> On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     Scylla uses a the seastar framework, which provides for both
>     user-level thread scheduling and simple run-to-completion tasks.
>
>     Huge pages are limited to 2MB (and 1GB, but these aren't available
>     as transparent hugepages).
>
>
>     On 03/11/2017 10:26 PM, Kant Kodali wrote:
>>     @Dor
>>
>>     1) You guys have a CPU scheduler? you mean user level thread
>>     Scheduler that maps user level threads to kernel level threads? I
>>     thought C++ by default creates native kernel threads but sure
>>     nothing will stop someone to create a user level scheduling
>>     library if that's what you are talking about?
>>     2) How can one create THP of size 1KB? According to this post
>>     <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
>>     looks like the valid values 2MB and 1GB.
>>
>>     Thanks,
>>     kant
>>
>>     On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <avi@scylladb.com
>>     <ma...@scylladb.com>> wrote:
>>
>>         Agreed, I'd recommend to treat benchmarks as a rough guide to
>>         see where there is potential, and follow through with your
>>         own tests.
>>
>>         On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>>
>>>         Benchmarks are great for FUDly blog posts. Real world work
>>>         loads matter more. Every NoSQL vendor wins their benchmarks.
>>
>>
>>
>
>
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

Here is the Java version http://docs.paralleluniverse.co/quasar/ but I
still don't see how user level scheduling can be beneficial (This is a well
debated problem)? How can this add to the performance? or say why is user
level scheduling necessary Given the Thread per core design and the
callback mechanism?

On Sat, Mar 11, 2017 at 12:51 PM, Avi Kivity <av...@scylladb.com> wrote:

> Scylla uses a the seastar framework, which provides for both user-level
> thread scheduling and simple run-to-completion tasks.
>
> Huge pages are limited to 2MB (and 1GB, but these aren't available as
> transparent hugepages).
>
>
> On 03/11/2017 10:26 PM, Kant Kodali wrote:
>
> @Dor
>
> 1) You guys have a CPU scheduler? you mean user level thread Scheduler
> that maps user level threads to kernel level threads? I thought C++ by
> default creates native kernel threads but sure nothing will stop someone to
> create a user level scheduling library if that's what you are talking about?
> 2) How can one create THP of size 1KB? According to this post
> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it
> looks like the valid values 2MB and 1GB.
>
> Thanks,
> kant
>
> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:
>
>> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
>> there is potential, and follow through with your own tests.
>>
>> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>
>>
>> Benchmarks are great for FUDly blog posts. Real world work loads matter
>> more. Every NoSQL vendor wins their benchmarks.
>>
>>
>>
>
>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

Scylla uses a the seastar framework, which provides for both user-level 
thread scheduling and simple run-to-completion tasks.

Huge pages are limited to 2MB (and 1GB, but these aren't available as 
transparent hugepages).

On 03/11/2017 10:26 PM, Kant Kodali wrote:
> @Dor
>
> 1) You guys have a CPU scheduler? you mean user level thread Scheduler 
> that maps user level threads to kernel level threads? I thought C++ by 
> default creates native kernel threads but sure nothing will stop 
> someone to create a user level scheduling library if that's what you 
> are talking about?
> 2) How can one create THP of size 1KB? According to this post 
> <https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html> it 
> looks like the valid values 2MB and 1GB.
>
> Thanks,
> kant
>
> On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     Agreed, I'd recommend to treat benchmarks as a rough guide to see
>     where there is potential, and follow through with your own tests.
>
>     On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>>
>>     Benchmarks are great for FUDly blog posts. Real world work loads
>>     matter more. Every NoSQL vendor wins their benchmarks.
>
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

@Dor

1) You guys have a CPU scheduler? you mean user level thread Scheduler that
maps user level threads to kernel level threads? I thought C++ by default
creates native kernel threads but sure nothing will stop someone to create
a user level scheduling library if that's what you are talking about?
2) How can one create THP of size 1KB? According to this post
<https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html>
it
looks like the valid values 2MB and 1GB.

Thanks,
kant

On Sat, Mar 11, 2017 at 11:41 AM, Avi Kivity <av...@scylladb.com> wrote:

> Agreed, I'd recommend to treat benchmarks as a rough guide to see where
> there is potential, and follow through with your own tests.
>
> On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>
>
> Benchmarks are great for FUDly blog posts. Real world work loads matter
> more. Every NoSQL vendor wins their benchmarks.
>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

Agreed, I'd recommend to treat benchmarks as a rough guide to see where 
there is potential, and follow through with your own tests.

On 03/11/2017 09:37 PM, Edward Capriolo wrote:
>
> Benchmarks are great for FUDly blog posts. Real world work loads 
> matter more. Every NoSQL vendor wins their benchmarks.

RE: scylladb

Posted by Jacques-Henri Berthemet <ja...@genesys.com>.

Will you support custom secondary indexes, triggers and UDF?
I checked index code but it’s just a couple of files with commented Java code. I’m curious to test Scylladb but our application uses LWT and custom secondary indexes, I understand LWT is coming (soon?).

--
Jacques-Henri Berthemet

From: sfescape@gmail.com [mailto:sfescape@gmail.com]
Sent: dimanche 12 mars 2017 09:23
To: user@cassandra.apache.org
Subject: Re: scylladb

On Sat, Mar 11, 2017 at 1:52 AM Avi Kivity <av...@scylladb.com>> wrote:

Lastly, why don't you test Scylla yourself?  It's pretty easy to set up, there's nothing to tune.

Avi

 I'll look seriously at Scylla when it is 3.0.12 compatible.

Re: scylladb

Posted by "sfescape@gmail.com" <sf...@gmail.com>.

On Sat, Mar 11, 2017 at 1:52 AM Avi Kivity <av...@scylladb.com> wrote:

>
>
> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune.
>
> Avi
>

 I'll look seriously at Scylla when it is 3.0.12 compatible.

Re: scylladb

Posted by Edward Capriolo <ed...@gmail.com>.

On Sat, Mar 11, 2017 at 2:08 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> "Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune."
>  - The details are indeed compelling to have a go ahead and test it for
> specific use case.
>
> If it works out good it can lead to good cost cut in infra costs as well
> as having to manage less servers plus probably less time to bootstrap &
> decommission nodes!
>
> It will also be interesting to have a benchmark with Cassandra 3 version
> as well, as the new storage engine is said to have better performance:
> https://www.datastax.com/2015/12/storage-engine-30
>
> Regards,
> Bhuvan
>
> On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> There is no magic 10X bullet.  It's a mix of multiple factors, which can
>> come up to less than 10X in some circumstances and more than 10X in others,
>> as has been reported on this thread by others.
>>
>> TPC doesn't give _any_ advantage when you have just one core, and can
>> give more than 10X on a machine with a large number of cores.  These are
>> becoming more and more common, think of the recent AMD Naples announcement;
>> with 32 cores per socket you can have 128 logical cores in a two-socket
>> server; or the AWS i3.16xlarge instance with 32 cores / 64 vcpus.
>>
>> You're welcome to browse our site to learn more about the architecture,
>> or watch this technical talk [1] I gave in QConSF that highlights some of
>> the techniques we use.
>>
>> Of course it's possible to mistune Cassandra to give bad results, that is
>> why we spent a lot more time tuning Cassandra and documenting everything
>> than we spent on Scylla.  You can read the report in [2], it is very
>> detailed, and provides a wealth of metrics like you'd expect.
>>
>> I'm not going to comment about the Aerospike numbers, I haven't studied
>> them in detail.  And no, you can't multiply results like that unless they
>> were done with very similar configurations and test harnesses.
>>
>> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
>> there's nothing to tune.
>>
>> Avi
>>
>> [1] https://www.infoq.com/presentations/scylladb
>> [2] http://www.scylladb.com/technology/cassandra-vs-scylla-bench
>> mark-cluster-1/
>>
>>
>> On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed different from Cassandra such as getting
>>> rid of SEDA and moving to TPC and so on.
>>>
>>> If someone thinks it is because of C++ then just show the benchmarks
>>> that proves it is indeed the C++ which gave 10X performance boost as
>>> ScyllaDB claims instead of stating it.
>>>
>>>
>>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <
>>> mrburton@gmail.com> wrote:
>>>
>>>> They spend an enormous amount of time focusing on performance. You can
>>>> expect them to continue on with their optimization and keep crushing it.
>>>>
>>>> P.S., I don't work for ScyllaDB.
>>>>
>>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <
>>>> rakeshkumar464@outlook.com> wrote:
>>>>
>>>>> In all of their presentation they keep harping on the fact that
>>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>>> the difference looks staggering.
>>>>> ________________________________________
>>>>> From: daemeon reiydelle <da...@gmail.com>
>>>>> Sent: Thursday, March 9, 2017 14:21
>>>>> To: user@cassandra.apache.org
>>>>> Subject: Re: scylladb
>>>>>
>>>>> The comparison is fair, and conservative. Did substantial performance
>>>>> comparisons for two clients, both results returned throughputs that were
>>>>> faster than the published comparisons (15x as I recall). At that time the
>>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>>> for OLA compliance.
>>>>>
>>>>>
>>>>> .......
>>>>>
>>>>> Daemeon C.M. Reiydelle
>>>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>>>> London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>
>>>>>
>>>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>>>> robin@us2.nl>> wrote:
>>>>> I was wondering how people feel about the comparison that's made here
>>>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>>>> cassandra-nodes
>>>>>
>>>>> They are claiming a 10x improvement, is that a fair comparison or
>>>>> maybe a somewhat coloured view of a (micro)benchmark in a specific setup?
>>>>> Any pros/cons known?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Robin Verlangen
>>>>> Chief Data Architect
>>>>>
>>>>> Disclaimer: The information contained in this message and attachments
>>>>> is intended solely for the attention and use of the named addressee and may
>>>>> be confidential. If you are not the intended recipient, you are reminded
>>>>> that the information remains the property of the sender. You must not use,
>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>>> received this message in error, please contact the sender immediately and
>>>>> irrevocably delete this message and any copies.
>>>>>
>>>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com
>>>>> <ma...@pythian.com>> wrote:
>>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>>> short of installing it. Let's see if this one is for real!
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>>> losjuzarterolo>
>>>>> Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613
>>>>> 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>>> <tel:+1%20613-565-8696>
>>>>> www.pythian.com<http://www.pythian.com/>
>>>>>
>>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>>> dani.traphagen@datastax.com<ma...@datastax.com>>
>>>>> wrote:
>>>>> You'll be the first Carlos.
>>>>>
>>>>> [Inline image 1]
>>>>>
>>>>> Had any rain lately? Curious how this went, if so.
>>>>>
>>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>>> about actual use, so far.
>>>>>
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>>>> <ma...@mrcalonso.com>> wrote:
>>>>> Any update about this?
>>>>>
>>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>>>> alonso>
>>>>>
>>>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:
>>>>> rolo@pythian.com>> wrote:
>>>>> Something to do on a expected rainy weekend. Thanks for the
>>>>> information.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>>> losjuzarterolo>
>>>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>>>> 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>>> <tel:%2B1%20613%20565%208696%20x1649>
>>>>> www.pythian.com<http://www.pythian.com/>
>>>>>
>>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>>> dani.traphagen@datastax.com<ma...@datastax.com>>
>>>>> wrote:
>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>
>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>
>>>>>
>>>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>>>> rolo@pythian.com>> wrote:
>>>>> I will not try until multi-DC is implemented. More than an month has
>>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>>> take some time to test it.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>>> losjuzarterolo>
>>>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>>>> 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>>> <tel:%2B1%20613%20565%208696%20x1649>
>>>>> www.pythian.com<http://www.pythian.com/>
>>>>>
>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>>> wrote:
>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>>> feedback.
>>>>>
>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi guys,
>>>>> >
>>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>>> in town) and has some thoughts/hands-on experience to share?
>>>>> >
>>>>> > Cheers,
>>>>> > Tommaso
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> [datastax_logo.png]<http://www.datastax.com/>
>>>>>
>>>>> DANI TRAPHAGEN
>>>>>
>>>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>>>> dani.traphagen@datastax.com>
>>>>>
>>>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>>>> https://github.com/dtrapezoid>
>>>>>
>>>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -Richard L. Burton III
>>>> @rburton
>>>>
>>>
>>>
>>>
>>
>>
> Benchmarks are great for FUDly blog posts. Real world work loads matter
more. Every NoSQL vendor wins their benchmarks.

Re: scylladb

Posted by Bhuvan Rawal <bh...@gmail.com>.

"Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
there's nothing to tune."
 - The details are indeed compelling to have a go ahead and test it for
specific use case.

If it works out good it can lead to good cost cut in infra costs as well as
having to manage less servers plus probably less time to bootstrap &
decommission nodes!

It will also be interesting to have a benchmark with Cassandra 3 version as
well, as the new storage engine is said to have better performance:
https://www.datastax.com/2015/12/storage-engine-30

Regards,
Bhuvan

On Sat, Mar 11, 2017 at 2:59 PM, Avi Kivity <av...@scylladb.com> wrote:

> There is no magic 10X bullet.  It's a mix of multiple factors, which can
> come up to less than 10X in some circumstances and more than 10X in others,
> as has been reported on this thread by others.
>
> TPC doesn't give _any_ advantage when you have just one core, and can give
> more than 10X on a machine with a large number of cores.  These are
> becoming more and more common, think of the recent AMD Naples announcement;
> with 32 cores per socket you can have 128 logical cores in a two-socket
> server; or the AWS i3.16xlarge instance with 32 cores / 64 vcpus.
>
> You're welcome to browse our site to learn more about the architecture, or
> watch this technical talk [1] I gave in QConSF that highlights some of the
> techniques we use.
>
> Of course it's possible to mistune Cassandra to give bad results, that is
> why we spent a lot more time tuning Cassandra and documenting everything
> than we spent on Scylla.  You can read the report in [2], it is very
> detailed, and provides a wealth of metrics like you'd expect.
>
> I'm not going to comment about the Aerospike numbers, I haven't studied
> them in detail.  And no, you can't multiply results like that unless they
> were done with very similar configurations and test harnesses.
>
> Lastly, why don't you test Scylla yourself?  It's pretty easy to set up,
> there's nothing to tune.
>
> Avi
>
> [1] https://www.infoq.com/presentations/scylladb
> [2] http://www.scylladb.com/technology/cassandra-vs-scylla-
> benchmark-cluster-1/
>
>
> On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
>
> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
>> claims instead of stating it.
>>
>>
>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mrburton@gmail.com
>> > wrote:
>>
>>> They spend an enormous amount of time focusing on performance. You can
>>> expect them to continue on with their optimization and keep crushing it.
>>>
>>> P.S., I don't work for ScyllaDB.
>>>
>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <rakeshkumar464@outlook.com
>>> > wrote:
>>>
>>>> In all of their presentation they keep harping on the fact that
>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>> the difference looks staggering.
>>>> ________________________________________
>>>> From: daemeon reiydelle <da...@gmail.com>
>>>> Sent: Thursday, March 9, 2017 14:21
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: scylladb
>>>>
>>>> The comparison is fair, and conservative. Did substantial performance
>>>> comparisons for two clients, both results returned throughputs that were
>>>> faster than the published comparisons (15x as I recall). At that time the
>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>> for OLA compliance.
>>>>
>>>>
>>>> .......
>>>>
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>>> London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>
>>>>
>>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>>> robin@us2.nl>> wrote:
>>>> I was wondering how people feel about the comparison that's made here
>>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>>> cassandra-nodes
>>>>
>>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>>> pros/cons known?
>>>>
>>>> Best regards,
>>>>
>>>> Robin Verlangen
>>>> Chief Data Architect
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
>>>> rolo@pythian.com>> wrote:
>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>> short of installing it. Let's see if this one is for real!
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613
>>>> 565 8696 x1649 <%2B1%20613%20565%208696%20x1649><tel:+1%20613-565-8696>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>>> You'll be the first Carlos.
>>>>
>>>> [Inline image 1]
>>>>
>>>> Had any rain lately? Curious how this went, if so.
>>>>
>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>> about actual use, so far.
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>>> <ma...@mrcalonso.com>> wrote:
>>>> Any update about this?
>>>>
>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>>> alonso>
>>>>
>>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>>> pythian.com>> wrote:
>>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>>> 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>> <tel:%2B1%20613%20565%208696%20x1649>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>>> As of two days ago, they say they've got it @cjrolo.
>>>>
>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>
>>>>
>>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>>> rolo@pythian.com>> wrote:
>>>> I will not try until multi-DC is implemented. More than an month has
>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>> take some time to test it.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>>> 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>> <tel:%2B1%20613%20565%208696%20x1649>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>> wrote:
>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>> feedback.
>>>>
>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi guys,
>>>> >
>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>> in town) and has some thoughts/hands-on experience to share?
>>>> >
>>>> > Cheers,
>>>> > Tommaso
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> [datastax_logo.png]<http://www.datastax.com/>
>>>>
>>>> DANI TRAPHAGEN
>>>>
>>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>>> dani.traphagen@datastax.com>
>>>>
>>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>>> https://github.com/dtrapezoid>
>>>>
>>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>>
>>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

There is no magic 10X bullet.  It's a mix of multiple factors, which can 
come up to less than 10X in some circumstances and more than 10X in 
others, as has been reported on this thread by others.

TPC doesn't give _any_ advantage when you have just one core, and can 
give more than 10X on a machine with a large number of cores. These are 
becoming more and more common, think of the recent AMD Naples 
announcement; with 32 cores per socket you can have 128 logical cores in 
a two-socket server; or the AWS i3.16xlarge instance with 32 cores / 64 
vcpus.

You're welcome to browse our site to learn more about the architecture, 
or watch this technical talk [1] I gave in QConSF that highlights some 
of the techniques we use.

Of course it's possible to mistune Cassandra to give bad results, that 
is why we spent a lot more time tuning Cassandra and documenting 
everything than we spent on Scylla.  You can read the report in [2], it 
is very detailed, and provides a wealth of metrics like you'd expect.

I'm not going to comment about the Aerospike numbers, I haven't studied 
them in detail.  And no, you can't multiply results like that unless 
they were done with very similar configurations and test harnesses.

Lastly, why don't you test Scylla yourself?  It's pretty easy to set up, 
there's nothing to tune.

Avi

[1] https://www.infoq.com/presentations/scylladb
[2] 
http://www.scylladb.com/technology/cassandra-vs-scylla-benchmark-cluster-1/

On 03/10/2017 06:58 PM, Bhuvan Rawal wrote:
> Agreed C++ gives an added advantage to talk to underlying hardware 
> with better efficiency, it sound good but can a pice of code written 
> in C++ give 1000% throughput than a Java app? Is TPC design 10X more 
> performant than SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself 
> written in C) claim to be 10X faster than Scylla here 
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining 
> your's and aerospike's benchmarks it appears that Aerospike is 100X 
> performant than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one 
> can observe 10X performance difference between a mistuned cassandra 
> cluster and one thats tuned as per data model - there are so many 
> Tunables in yaml as well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide 
> complete system metrics (Disk, CPU, Network), the OPS increase starts 
> to decay along with the configs used. Having plain ops per second and 
> 99p latency is blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <avi@scylladb.com 
> <ma...@scylladb.com>> wrote:
>
>     ScyllaDB engineer here.
>
>     C++ is really an enabling technology here. It is directly
>     responsible for a small fraction of the gain by executing faster
>     than Java.  But it is indirectly responsible for the gain by
>     allowing us direct control over memory and threading.  Just as an
>     example, Scylla starts by taking over almost all of the machine's
>     memory, and dynamically assigning it to memtables, cache, and
>     working memory needed to handle requests in flight.  Memory is
>     statically partitioned across cores, allowing us to exploit NUMA
>     fully.  You can't do these things in Java.
>
>     I would say the major contributors to Scylla performance are:
>      - thread-per-core design
>      - replacement of the page cache with a row cache
>      - careful attention to many small details, each contributing a
>     little, but with a large overall impact
>
>     While I'm here I can say that performance is not the only goal
>     here, it is stable and predictable performance over varying loads
>     and during maintenance operations like repair, without any special
>     tuning.  We measure the amount of CPU and I/O spent on foreground
>     (user) and background (maintenance) tasks and divide them fairly.
>     This work is not complete but already makes operating Scylla a lot
>     simpler.
>
>
>     On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>     I dont think ScyllaDB performance is because of C++. The design
>>     decisions in scylladb are indeed different from Cassandra such as
>>     getting rid of SEDA and moving to TPC and so on.
>>
>>     If someone thinks it is because of C++ then just show the
>>     benchmarks that proves it is indeed the C++ which gave 10X
>>     performance boost as ScyllaDB claims instead of stating it.
>>
>>
>>     On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III
>>     <mrburton@gmail.com <ma...@gmail.com>> wrote:
>>
>>         They spend an enormous amount of time focusing on
>>         performance. You can expect them to continue on with their
>>         optimization and keep crushing it.
>>
>>         P.S., I don't work for ScyllaDB.
>>
>>         On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar
>>         <rakeshkumar464@outlook.com
>>         <ma...@outlook.com>> wrote:
>>
>>             In all of their presentation they keep harping on the
>>             fact that scylladb is written in C++ and does not carry
>>             the overhead of Java.  Still the difference looks staggering.
>>             ________________________________________
>>             From: daemeon reiydelle <daemeonr@gmail.com
>>             <ma...@gmail.com>>
>>             Sent: Thursday, March 9, 2017 14:21
>>             To: user@cassandra.apache.org
>>             <ma...@cassandra.apache.org>
>>             Subject: Re: scylladb
>>
>>             The comparison is fair, and conservative. Did substantial
>>             performance comparisons for two clients, both results
>>             returned throughputs that were faster than the published
>>             comparisons (15x as I recall). At that time the client
>>             preferred to utilize a Cass COTS solution and use a
>>             caching solution for OLA compliance.
>>
>>
>>             .......
>>
>>             Daemeon C.M. Reiydelle
>>             USA (+1) 415.501.0198 <tel:%28%2B1%29%20415.501.0198>
>>             London (+44) (0) 20 8144 9872
>>             <tel:%28%2B44%29%20%280%29%2020%208144%209872>
>>
>>             On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen
>>             <robin@us2.nl <ma...@us2.nl><mailto:robin@us2.nl
>>             <ma...@us2.nl>>> wrote:
>>             I was wondering how people feel about the comparison
>>             that's made here between Cassandra and ScyllaDB :
>>             http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes
>>             <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>>
>>             They are claiming a 10x improvement, is that a fair
>>             comparison or maybe a somewhat coloured view of a
>>             (micro)benchmark in a specific setup? Any pros/cons known?
>>
>>             Best regards,
>>
>>             Robin Verlangen
>>             Chief Data Architect
>>
>>             Disclaimer: The information contained in this message and
>>             attachments is intended solely for the attention and use
>>             of the named addressee and may be confidential. If you
>>             are not the intended recipient, you are reminded that the
>>             information remains the property of the sender. You must
>>             not use, disclose, distribute, copy, print or rely on
>>             this e-mail. If you have received this message in error,
>>             please contact the sender immediately and irrevocably
>>             delete this message and any copies.
>>
>>             On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo
>>             <rolo@pythian.com
>>             <ma...@pythian.com><mailto:rolo@pythian.com
>>             <ma...@pythian.com>>> wrote:
>>             No rain at all! But I almost had it running last weekend,
>>             but stopped short of installing it. Let's see if this one
>>             is for real!
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>>
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>>             | Tel: +1 613 565 8696 x1649
>>             <tel:%2B1%20613%20565%208696%20x1649><tel:+1%20613-565-8696>
>>             www.pythian.com
>>             <http://www.pythian.com><http://www.pythian.com/
>>             <http://www.pythian.com/>>
>>
>>             On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen
>>             <dani.traphagen@datastax.com
>>             <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>             <ma...@datastax.com>>> wrote:
>>             You'll be the first Carlos.
>>
>>             [Inline image 1]
>>
>>             Had any rain lately? Curious how this went, if so.
>>
>>             On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky
>>             <jack.krupansky@gmail.com
>>             <ma...@gmail.com><mailto:jack.krupansky@gmail.com
>>             <ma...@gmail.com>>> wrote:
>>             I just did a Twitter search on scylladb and did not see
>>             any tweets about actual use, so far.
>>
>>
>>             -- Jack Krupansky
>>
>>             On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso
>>             <info@mrcalonso.com
>>             <ma...@mrcalonso.com><mailto:info@mrcalonso.com
>>             <ma...@mrcalonso.com>>> wrote:
>>             Any update about this?
>>
>>             @Carlos Rolo, did you tried it? Thoughts?
>>
>>             Carlos Alonso | Software Engineer |
>>             @calonso<https://twitter.com/calonso
>>             <https://twitter.com/calonso>>
>>
>>             On 5 November 2015 at 14:07, Carlos Rolo
>>             <rolo@pythian.com
>>             <ma...@pythian.com><mailto:rolo@pythian.com
>>             <ma...@pythian.com>>> wrote:
>>             Something to do on a expected rainy weekend. Thanks for
>>             the information.
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>>
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>>             | Tel: +1 613 565 8696 x1649
>>             <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>             www.pythian.com
>>             <http://www.pythian.com><http://www.pythian.com/
>>             <http://www.pythian.com/>>
>>
>>             On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen
>>             <dani.traphagen@datastax.com
>>             <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>             <ma...@datastax.com>>> wrote:
>>             As of two days ago, they say they've got it @cjrolo.
>>
>>             https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>             <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>>
>>
>>             On Thursday, November 5, 2015, Carlos Rolo
>>             <rolo@pythian.com
>>             <ma...@pythian.com><mailto:rolo@pythian.com
>>             <ma...@pythian.com>>> wrote:
>>             I will not try until multi-DC is implemented. More than
>>             an month has passed since I looked for it, so it could
>>             possibly be in place, if so I may take some time to test it.
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>>
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>>             | Tel: +1 613 565 8696 x1649
>>             <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>             www.pythian.com
>>             <http://www.pythian.com><http://www.pythian.com/
>>             <http://www.pythian.com/>>
>>
>>             On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad
>>             <jonathan.haddad@gmail.com
>>             <ma...@gmail.com>> wrote:
>>             Nope, no one I know.  Let me know if you try it I'd love
>>             to hear your feedback.
>>
>>             > On Nov 5, 2015, at 9:22 AM, tommaso barbugli
>>             <tbarbugli@gmail.com <ma...@gmail.com>> wrote:
>>             >
>>             > Hi guys,
>>             >
>>             > did anyone already try Scylladb (yet another fastest
>>             NoSQL database in town) and has some thoughts/hands-on
>>             experience to share?
>>             >
>>             > Cheers,
>>             > Tommaso
>>
>>
>>
>>
>>             --
>>
>>
>>
>>
>>             --
>>             Sent from mobile -- apologizes for brevity or errors.
>>
>>
>>
>>             --
>>
>>
>>
>>
>>
>>
>>
>>             --
>>             [datastax_logo.png]<http://www.datastax.com/
>>             <http://www.datastax.com/>>
>>
>>             DANI TRAPHAGEN
>>
>>             Technical Enablement Lead | dani.traphagen@datastax.com
>>             <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>             <ma...@datastax.com>>
>>
>>             [twitter.png]<https://twitter.com/dtrapezoid
>>             <https://twitter.com/dtrapezoid>> [linkedin.png]
>>             <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85
>>             <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>
>>             [https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q
>>             <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>>             <https://github.com/dtrapezoid>
>>
>>             [http://datastax.com/all/images/cs_logo_color_sm.png
>>             <http://datastax.com/all/images/cs_logo_color_sm.png>]
>>
>>
>>
>>             --
>>
>>
>>
>>
>>
>>
>>
>>         -- 
>>         -Richard L. Burton III
>>         @rburton
>>
>>
>
>

Re: scylladb

Posted by Dor Laor <do...@scylladb.com>.

On Sun, Mar 12, 2017 at 6:40 AM, Stefan Podkowinski <sp...@apache.org> wrote:

> If someone would create a benchmark showing that Cassandra is 10x faster
> than Aerospike, would that mean Cassandra is 100x faster than ScyllaDB?
>
> Joking aside, I personally don't pay a lot of attention to any published
> benchmarks and look at them as pure marketing material. What I'm interested
> in instead is to learn why exactly one solution is faster than the other
> and I have to say that Avi is doing a really good job explaining the design
> motivations behind ScyllaDB in his presentations.
>
> But the Aerospike comparison also has a good point by showing that you
> probably always will be able to find a solution that is faster for a
> certain work load. Therefor the most important step when looking for the
> fastest datastore, is to first really understand your work load
> characteristic. Unfortunately this is something people tend to skip and
> instead get lost in controversial benchmark discussions, which are more fun
> than thinking about your data model and talking to people about projected
> long term load. Because if you do, you might realize that those benchmark
> test scenarios (e.g. insert 1TB as fast as possible and measure compaction
> times) aren't actually that relevant for your application.
>
Agree, however, it allows you to realize what a real workload will suffer
from and that's why we
measured a 'read while heavily writing' result too. In addition we measured
small, medium and large datasets for read only. Still, benchmarks are not a
real workload and we always advise to use our Prometheus detailed metrics
to realize if the hardware is utilized and to understand what's the
bottleneck. Scylla implemented the CQL tracing and can run the slow query
tracing all of the time with a low performance impact



>
> On 03/10/2017 05:58 PM, Bhuvan Rawal wrote:
>
> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
>> claims instead of stating it.
>>
>>
>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mrburton@gmail.com
>> > wrote:
>>
>>> They spend an enormous amount of time focusing on performance. You can
>>> expect them to continue on with their optimization and keep crushing it.
>>>
>>> P.S., I don't work for ScyllaDB.
>>>
>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <rakeshkumar464@outlook.com
>>> > wrote:
>>>
>>>> In all of their presentation they keep harping on the fact that
>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>> the difference looks staggering.
>>>> ________________________________________
>>>> From: daemeon reiydelle <da...@gmail.com>
>>>> Sent: Thursday, March 9, 2017 14:21
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: scylladb
>>>>
>>>> The comparison is fair, and conservative. Did substantial performance
>>>> comparisons for two clients, both results returned throughputs that were
>>>> faster than the published comparisons (15x as I recall). At that time the
>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>> for OLA compliance.
>>>>
>>>>
>>>> .......
>>>>
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>>> London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>
>>>>
>>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>>> robin@us2.nl>> wrote:
>>>> I was wondering how people feel about the comparison that's made here
>>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>>> cassandra-nodes
>>>>
>>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>>> pros/cons known?
>>>>
>>>> Best regards,
>>>>
>>>> Robin Verlangen
>>>> Chief Data Architect
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
>>>> rolo@pythian.com>> wrote:
>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>> short of installing it. Let's see if this one is for real!
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613
>>>> 565 8696 x1649 <%2B1%20613%20565%208696%20x1649><tel:+1%20613-565-8696>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>>> You'll be the first Carlos.
>>>>
>>>> [Inline image 1]
>>>>
>>>> Had any rain lately? Curious how this went, if so.
>>>>
>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>> about actual use, so far.
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>>> <ma...@mrcalonso.com>> wrote:
>>>> Any update about this?
>>>>
>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>>> alonso>
>>>>
>>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>>> pythian.com>> wrote:
>>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>>> 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>> <tel:%2B1%20613%20565%208696%20x1649>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>>> As of two days ago, they say they've got it @cjrolo.
>>>>
>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>
>>>>
>>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>>> rolo@pythian.com>> wrote:
>>>> I will not try until multi-DC is implemented. More than an month has
>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>> take some time to test it.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1
>>>> 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>> <tel:%2B1%20613%20565%208696%20x1649>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>> wrote:
>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>> feedback.
>>>>
>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi guys,
>>>> >
>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>> in town) and has some thoughts/hands-on experience to share?
>>>> >
>>>> > Cheers,
>>>> > Tommaso
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> [datastax_logo.png]<http://www.datastax.com/>
>>>>
>>>> DANI TRAPHAGEN
>>>>
>>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>>> dani.traphagen@datastax.com>
>>>>
>>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>>> https://github.com/dtrapezoid>
>>>>
>>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>>
>>
>
>

Re: scylladb

Posted by Stefan Podkowinski <sp...@apache.org>.

If someone would create a benchmark showing that Cassandra is 10x faster
than Aerospike, would that mean Cassandra is 100x faster than ScyllaDB?

Joking aside, I personally don't pay a lot of attention to any published
benchmarks and look at them as pure marketing material. What I'm
interested in instead is to learn why exactly one solution is faster
than the other and I have to say that Avi is doing a really good job
explaining the design motivations behind ScyllaDB in his presentations.

But the Aerospike comparison also has a good point by showing that you
probably always will be able to find a solution that is faster for a
certain work load. Therefor the most important step when looking for the
fastest datastore, is to first really understand your work load
characteristic. Unfortunately this is something people tend to skip and
instead get lost in controversial benchmark discussions, which are more
fun than thinking about your data model and talking to people about
projected long term load. Because if you do, you might realize that
those benchmark test scenarios (e.g. insert 1TB as fast as possible and
measure compaction times) aren't actually that relevant for your
application.


On 03/10/2017 05:58 PM, Bhuvan Rawal wrote:
> Agreed C++ gives an added advantage to talk to underlying hardware
> with better efficiency, it sound good but can a pice of code written
> in C++ give 1000% throughput than a Java app? Is TPC design 10X more
> performant than SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla
> here http://www.aerospike.com/benchmarks/scylladb-initial/ ?
> (Combining your's and aerospike's benchmarks it appears that Aerospike
> is 100X performant than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one
> can observe 10X performance difference between a mistuned cassandra
> cluster and one thats tuned as per data model - there are so many
> Tunables in yaml as well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide
> complete system metrics (Disk, CPU, Network), the OPS increase starts
> to decay along with the configs used. Having plain ops per second and
> 99p latency is blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <avi@scylladb.com
> <ma...@scylladb.com>> wrote:
>
>     ScyllaDB engineer here.
>
>     C++ is really an enabling technology here. It is directly
>     responsible for a small fraction of the gain by executing faster
>     than Java.  But it is indirectly responsible for the gain by
>     allowing us direct control over memory and threading.  Just as an
>     example, Scylla starts by taking over almost all of the machine's
>     memory, and dynamically assigning it to memtables, cache, and
>     working memory needed to handle requests in flight.  Memory is
>     statically partitioned across cores, allowing us to exploit NUMA
>     fully.  You can't do these things in Java.
>
>     I would say the major contributors to Scylla performance are:
>      - thread-per-core design
>      - replacement of the page cache with a row cache
>      - careful attention to many small details, each contributing a
>     little, but with a large overall impact
>
>     While I'm here I can say that performance is not the only goal
>     here, it is stable and predictable performance over varying loads
>     and during maintenance operations like repair, without any special
>     tuning.  We measure the amount of CPU and I/O spent on foreground
>     (user) and background (maintenance) tasks and divide them fairly. 
>     This work is not complete but already makes operating Scylla a lot
>     simpler.
>
>
>     On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>     I dont think ScyllaDB performance is because of C++. The design
>>     decisions in scylladb are indeed different from Cassandra such as
>>     getting rid of SEDA and moving to TPC and so on. 
>>
>>     If someone thinks it is because of C++ then just show the
>>     benchmarks that proves it is indeed the C++ which gave 10X
>>     performance boost as ScyllaDB claims instead of stating it.
>>
>>
>>     On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III
>>     <mrburton@gmail.com <ma...@gmail.com>> wrote:
>>
>>         They spend an enormous amount of time focusing on
>>         performance. You can expect them to continue on with their
>>         optimization and keep crushing it.
>>
>>         P.S., I don't work for ScyllaDB.  
>>
>>         On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar
>>         <rakeshkumar464@outlook.com
>>         <ma...@outlook.com>> wrote:
>>
>>             In all of their presentation they keep harping on the
>>             fact that scylladb is written in C++ and does not carry
>>             the overhead of Java.  Still the difference looks staggering.
>>             ________________________________________
>>             From: daemeon reiydelle <daemeonr@gmail.com
>>             <ma...@gmail.com>>
>>             Sent: Thursday, March 9, 2017 14:21
>>             To: user@cassandra.apache.org
>>             <ma...@cassandra.apache.org>
>>             Subject: Re: scylladb
>>
>>             The comparison is fair, and conservative. Did substantial
>>             performance comparisons for two clients, both results
>>             returned throughputs that were faster than the published
>>             comparisons (15x as I recall). At that time the client
>>             preferred to utilize a Cass COTS solution and use a
>>             caching solution for OLA compliance.
>>
>>
>>             .......
>>
>>             Daemeon C.M. Reiydelle
>>             USA (+1) 415.501.0198 <tel:%28%2B1%29%20415.501.0198>
>>             London (+44) (0) 20 8144 9872
>>             <tel:%28%2B44%29%20%280%29%2020%208144%209872>
>>
>>             On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen
>>             <robin@us2.nl <ma...@us2.nl><mailto:robin@us2.nl
>>             <ma...@us2.nl>>> wrote:
>>             I was wondering how people feel about the comparison
>>             that's made here between Cassandra and ScyllaDB :
>>             http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes
>>             <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>>
>>             They are claiming a 10x improvement, is that a fair
>>             comparison or maybe a somewhat coloured view of a
>>             (micro)benchmark in a specific setup? Any pros/cons known?
>>
>>             Best regards,
>>
>>             Robin Verlangen
>>             Chief Data Architect
>>
>>             Disclaimer: The information contained in this message and
>>             attachments is intended solely for the attention and use
>>             of the named addressee and may be confidential. If you
>>             are not the intended recipient, you are reminded that the
>>             information remains the property of the sender. You must
>>             not use, disclose, distribute, copy, print or rely on
>>             this e-mail. If you have received this message in error,
>>             please contact the sender immediately and irrevocably
>>             delete this message and any copies.
>>
>>             On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo
>>             <rolo@pythian.com
>>             <ma...@pythian.com><mailto:rolo@pythian.com
>>             <ma...@pythian.com>>> wrote:
>>             No rain at all! But I almost had it running last weekend,
>>             but stopped short of installing it. Let's see if this one
>>             is for real!
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>>
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>>             | Tel: +1 613 565 8696 x1649
>>             <tel:%2B1%20613%20565%208696%20x1649><tel:+1%20613-565-8696>
>>             www.pythian.com
>>             <http://www.pythian.com><http://www.pythian.com/
>>             <http://www.pythian.com/>>
>>
>>             On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen
>>             <dani.traphagen@datastax.com
>>             <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>             <ma...@datastax.com>>> wrote:
>>             You'll be the first Carlos.
>>
>>             [Inline image 1]
>>
>>             Had any rain lately? Curious how this went, if so.
>>
>>             On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky
>>             <jack.krupansky@gmail.com
>>             <ma...@gmail.com><mailto:jack.krupansky@gmail.com
>>             <ma...@gmail.com>>> wrote:
>>             I just did a Twitter search on scylladb and did not see
>>             any tweets about actual use, so far.
>>
>>
>>             -- Jack Krupansky
>>
>>             On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso
>>             <info@mrcalonso.com
>>             <ma...@mrcalonso.com><mailto:info@mrcalonso.com
>>             <ma...@mrcalonso.com>>> wrote:
>>             Any update about this?
>>
>>             @Carlos Rolo, did you tried it? Thoughts?
>>
>>             Carlos Alonso | Software Engineer |
>>             @calonso<https://twitter.com/calonso
>>             <https://twitter.com/calonso>>
>>
>>             On 5 November 2015 at 14:07, Carlos Rolo
>>             <rolo@pythian.com
>>             <ma...@pythian.com><mailto:rolo@pythian.com
>>             <ma...@pythian.com>>> wrote:
>>             Something to do on a expected rainy weekend. Thanks for
>>             the information.
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>>
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>>             | Tel: +1 613 565 8696 x1649
>>             <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>             www.pythian.com
>>             <http://www.pythian.com><http://www.pythian.com/
>>             <http://www.pythian.com/>>
>>
>>             On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen
>>             <dani.traphagen@datastax.com
>>             <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>             <ma...@datastax.com>>> wrote:
>>             As of two days ago, they say they've got it @cjrolo.
>>
>>             https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>             <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>>
>>
>>             On Thursday, November 5, 2015, Carlos Rolo
>>             <rolo@pythian.com
>>             <ma...@pythian.com><mailto:rolo@pythian.com
>>             <ma...@pythian.com>>> wrote:
>>             I will not try until multi-DC is implemented. More than
>>             an month has passed since I looked for it, so it could
>>             possibly be in place, if so I may take some time to test it.
>>
>>             Regards,
>>
>>             Carlos Juzarte Rolo
>>             Cassandra Consultant
>>
>>             Pythian - Love your data
>>
>>             rolo@pythian | Twitter: @cjrolo | Linkedin:
>>             linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>>             <http://linkedin.com/in/carlosjuzarterolo>>
>>             Mobile: +351 91 891 81 00
>>             <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>>             | Tel: +1 613 565 8696 x1649
>>             <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>             www.pythian.com
>>             <http://www.pythian.com><http://www.pythian.com/
>>             <http://www.pythian.com/>>
>>
>>             On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad
>>             <jonathan.haddad@gmail.com
>>             <ma...@gmail.com>> wrote:
>>             Nope, no one I know.  Let me know if you try it I'd love
>>             to hear your feedback.
>>
>>             > On Nov 5, 2015, at 9:22 AM, tommaso barbugli
>>             <tbarbugli@gmail.com <ma...@gmail.com>> wrote:
>>             >
>>             > Hi guys,
>>             >
>>             > did anyone already try Scylladb (yet another fastest
>>             NoSQL database in town) and has some thoughts/hands-on
>>             experience to share?
>>             >
>>             > Cheers,
>>             > Tommaso
>>
>>
>>
>>
>>             --
>>
>>
>>
>>
>>             --
>>             Sent from mobile -- apologizes for brevity or errors.
>>
>>
>>
>>             --
>>
>>
>>
>>
>>
>>
>>
>>             --
>>             [datastax_logo.png]<http://www.datastax.com/
>>             <http://www.datastax.com/>>
>>
>>             DANI TRAPHAGEN
>>
>>             Technical Enablement Lead | dani.traphagen@datastax.com
>>             <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>>             <ma...@datastax.com>>
>>
>>             [twitter.png]<https://twitter.com/dtrapezoid
>>             <https://twitter.com/dtrapezoid>> [linkedin.png]
>>             <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85
>>             <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>> 
>>             [https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q
>>             <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>>             <https://github.com/dtrapezoid>
>>
>>             [http://datastax.com/all/images/cs_logo_color_sm.png
>>             <http://datastax.com/all/images/cs_logo_color_sm.png>]
>>
>>
>>
>>             --
>>
>>
>>
>>
>>
>>
>>
>>         -- 
>>         -Richard L. Burton III
>>         @rburton
>>
>>
>
>

Re: scylladb

Posted by Edward Capriolo <ed...@gmail.com>.

On Sat, Mar 11, 2017 at 9:41 PM, daemeon reiydelle <da...@gmail.com>
wrote:

> Recall that garbage collection on a busy node can occur minutes or seconds
> apart. Note that stop the world GC also happens as frequently as every
> couple of minutes on every node. Remove that and do the simple arithmetic.
>
>
> sent from my mobile
> Daemeon Reiydelle
> skype daemeon.c.m.reiydelle
> USA 415.501.0198 <(415)%20501-0198>
>
> On Mar 10, 2017 8:59 AM, "Bhuvan Rawal" <bh...@gmail.com> wrote:
>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed different from Cassandra such as getting
>>> rid of SEDA and moving to TPC and so on.
>>>
>>> If someone thinks it is because of C++ then just show the benchmarks
>>> that proves it is indeed the C++ which gave 10X performance boost as
>>> ScyllaDB claims instead of stating it.
>>>
>>>
>>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <
>>> mrburton@gmail.com> wrote:
>>>
>>>> They spend an enormous amount of time focusing on performance. You can
>>>> expect them to continue on with their optimization and keep crushing it.
>>>>
>>>> P.S., I don't work for ScyllaDB.
>>>>
>>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <
>>>> rakeshkumar464@outlook.com> wrote:
>>>>
>>>>> In all of their presentation they keep harping on the fact that
>>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>>> the difference looks staggering.
>>>>> ________________________________________
>>>>> From: daemeon reiydelle <da...@gmail.com>
>>>>> Sent: Thursday, March 9, 2017 14:21
>>>>> To: user@cassandra.apache.org
>>>>> Subject: Re: scylladb
>>>>>
>>>>> The comparison is fair, and conservative. Did substantial performance
>>>>> comparisons for two clients, both results returned throughputs that were
>>>>> faster than the published comparisons (15x as I recall). At that time the
>>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>>> for OLA compliance.
>>>>>
>>>>>
>>>>> .......
>>>>>
>>>>> Daemeon C.M. Reiydelle
>>>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>>>> London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>
>>>>>
>>>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>>>> robin@us2.nl>> wrote:
>>>>> I was wondering how people feel about the comparison that's made here
>>>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>>>> cassandra-nodes
>>>>>
>>>>> They are claiming a 10x improvement, is that a fair comparison or
>>>>> maybe a somewhat coloured view of a (micro)benchmark in a specific setup?
>>>>> Any pros/cons known?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Robin Verlangen
>>>>> Chief Data Architect
>>>>>
>>>>> Disclaimer: The information contained in this message and attachments
>>>>> is intended solely for the attention and use of the named addressee and may
>>>>> be confidential. If you are not the intended recipient, you are reminded
>>>>> that the information remains the property of the sender. You must not use,
>>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>>> received this message in error, please contact the sender immediately and
>>>>> irrevocably delete this message and any copies.
>>>>>
>>>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com
>>>>> <ma...@pythian.com>> wrote:
>>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>>> short of installing it. Let's see if this one is for real!
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>>> losjuzarterolo>
>>>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>>>>> | Tel: +1 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>>> <tel:+1%20613-565-8696>
>>>>> www.pythian.com<http://www.pythian.com/>
>>>>>
>>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>>> dani.traphagen@datastax.com<ma...@datastax.com>>
>>>>> wrote:
>>>>> You'll be the first Carlos.
>>>>>
>>>>> [Inline image 1]
>>>>>
>>>>> Had any rain lately? Curious how this went, if so.
>>>>>
>>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>>> about actual use, so far.
>>>>>
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>>>> <ma...@mrcalonso.com>> wrote:
>>>>> Any update about this?
>>>>>
>>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>>>> alonso>
>>>>>
>>>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:
>>>>> rolo@pythian.com>> wrote:
>>>>> Something to do on a expected rainy weekend. Thanks for the
>>>>> information.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>>> losjuzarterolo>
>>>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000>
>>>>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649
>>>>> <%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>>>> www.pythian.com<http://www.pythian.com/>
>>>>>
>>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>>> dani.traphagen@datastax.com<ma...@datastax.com>>
>>>>> wrote:
>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>
>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>
>>>>>
>>>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>>>> rolo@pythian.com>> wrote:
>>>>> I will not try until multi-DC is implemented. More than an month has
>>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>>> take some time to test it.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>>> losjuzarterolo>
>>>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000>
>>>>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649
>>>>> <%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>>>> www.pythian.com<http://www.pythian.com/>
>>>>>
>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>>> wrote:
>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>>> feedback.
>>>>>
>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi guys,
>>>>> >
>>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>>> in town) and has some thoughts/hands-on experience to share?
>>>>> >
>>>>> > Cheers,
>>>>> > Tommaso
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> [datastax_logo.png]<http://www.datastax.com/>
>>>>>
>>>>> DANI TRAPHAGEN
>>>>>
>>>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>>>> dani.traphagen@datastax.com>
>>>>>
>>>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>>>> https://github.com/dtrapezoid>
>>>>>
>>>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -Richard L. Burton III
>>>> @rburton
>>>>
>>>
>>>
>>>
>>
Make you wonder why http://www.hypertable.org/ has not taken over the world
yet.

For the average person Cassandra on docker, on dcos, on xen, on amazon,
where the machines are not even on the same physical switch and fsync is
virtualized ....What really matters? Unless you can drop in replace Scylla
for ANY version of cassandra and try it with your workload your not proving
much. More time should be spend understanding if leveldb/structured log
merge actually makes sense in your use case.

Re: scylladb

Posted by daemeon reiydelle <da...@gmail.com>.

Recall that garbage collection on a busy node can occur minutes or seconds
apart. Note that stop the world GC also happens as frequently as every
couple of minutes on every node. Remove that and do the simple arithmetic.


sent from my mobile
Daemeon Reiydelle
skype daemeon.c.m.reiydelle
USA 415.501.0198

On Mar 10, 2017 8:59 AM, "Bhuvan Rawal" <bh...@gmail.com> wrote:

> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
> Regards,
> Bhuvan
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
>> claims instead of stating it.
>>
>>
>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mrburton@gmail.com
>> > wrote:
>>
>>> They spend an enormous amount of time focusing on performance. You can
>>> expect them to continue on with their optimization and keep crushing it.
>>>
>>> P.S., I don't work for ScyllaDB.
>>>
>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <rakeshkumar464@outlook.com
>>> > wrote:
>>>
>>>> In all of their presentation they keep harping on the fact that
>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>> the difference looks staggering.
>>>> ________________________________________
>>>> From: daemeon reiydelle <da...@gmail.com>
>>>> Sent: Thursday, March 9, 2017 14:21
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: scylladb
>>>>
>>>> The comparison is fair, and conservative. Did substantial performance
>>>> comparisons for two clients, both results returned throughputs that were
>>>> faster than the published comparisons (15x as I recall). At that time the
>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>> for OLA compliance.
>>>>
>>>>
>>>> .......
>>>>
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>>> London (+44) (0) 20 8144 9872
>>>> <%28%2B44%29%20%280%29%2020%208144%209872>
>>>>
>>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>>> robin@us2.nl>> wrote:
>>>> I was wondering how people feel about the comparison that's made here
>>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>>> cassandra-nodes
>>>>
>>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>>> pros/cons known?
>>>>
>>>> Best regards,
>>>>
>>>> Robin Verlangen
>>>> Chief Data Architect
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
>>>> rolo@pythian.com>> wrote:
>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>> short of installing it. Let's see if this one is for real!
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>>>> | Tel: +1 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>>> <tel:+1%20613-565-8696>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>>> You'll be the first Carlos.
>>>>
>>>> [Inline image 1]
>>>>
>>>> Had any rain lately? Curious how this went, if so.
>>>>
>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>> about actual use, so far.
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>>> <ma...@mrcalonso.com>> wrote:
>>>> Any update about this?
>>>>
>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>>> alonso>
>>>>
>>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>>> pythian.com>> wrote:
>>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000>
>>>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649
>>>> <%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>>> As of two days ago, they say they've got it @cjrolo.
>>>>
>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>
>>>>
>>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>>> rolo@pythian.com>> wrote:
>>>> I will not try until multi-DC is implemented. More than an month has
>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>> take some time to test it.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>>> losjuzarterolo>
>>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000>
>>>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649
>>>> <%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>>> www.pythian.com<http://www.pythian.com/>
>>>>
>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>> wrote:
>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>> feedback.
>>>>
>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi guys,
>>>> >
>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>> in town) and has some thoughts/hands-on experience to share?
>>>> >
>>>> > Cheers,
>>>> > Tommaso
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> [datastax_logo.png]<http://www.datastax.com/>
>>>>
>>>> DANI TRAPHAGEN
>>>>
>>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>>> dani.traphagen@datastax.com>
>>>>
>>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>>> https://github.com/dtrapezoid>
>>>>
>>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>>
>>
>

Re: scylladb

Posted by benjamin roth <br...@gmail.com>.

Thanks a lot for your detailed explanation!
I am very curious about the future development of Scylladb! Especially
about mvs and lwt!

Am 11.03.2017 02:05 schrieb "Dor Laor" <do...@scylladb.com>:

> On Fri, Mar 10, 2017 at 4:45 PM, Kant Kodali <ka...@peernova.com> wrote:
>
>> http://performanceterracotta.blogspot.com/2012/09/numa-java.html
>> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/per
>> formance-enhancements-7.html
>> http://openjdk.java.net/jeps/163
>>
>>
> Java can exploit NUMA but it's not as a efficient as can be done in c++.
> Andrea Arcangeli is the engineer behind Linux transparent huge pages(THP),
> he
> reported to me and the idea belongs to Avi. We did it for KVM's sake but
> it was designed to any long running process like Cassandra.
> However, the entire software stack should be aware. If you get a huge page
> (2MB)
> but keep in it only 1KB you waste lots of mem. On top of this, threads
> need to
> touch their data structures and they need to be well aligned, otherwise
> the memory
> page will bounce between the different cores.
> With Cassandra it gets more complicated since there is a heap and off-heap
> data.
>
> Do programmers really track their data alignment? I doubt it.
> Do users run C* with the JVM numa options and the right Linux THP options?
> Again, I doubt.
>
> Scylla on the other side is designed for NUMA. We have 2-level sharding.
> The inner shards are transparent
> to the user and are per-core (hyper thread). Such a shard access RAM only
> within its numa node. Memory
> is bonded to each thread/numa node. We have our own malloc allocator built
> for this scheme.
>
>
>
>> If scyllaDB has efficient Secondary indexes, LWT and MV's then that is
>> something. I would be glad to see how they perform.
>>
>>
> MV will be in 1.8, we haven't measured performance yet. We did measure our
> counter implementation
> and it looks promising (4X better throughput and 4X better latency on a
> 8-core machine).
> The not-written yet LWT will kick-a** since our fully async engine is
> ideal for the larger number
> of round trips the LWT needs.
>
> This is with the Linux tcp stack, once we'll use our dpdk one, performance
> will improve further ;)
>
>
>
>>
>> On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor <do...@scylladb.com> wrote:
>>
>>> Scylla isn't just about performance too.
>>>
>>> First, a disclaimer, I am a Scylla co-founder. I respect open source a
>>> lot,
>>> so you guys are welcome to shush me out of this thread. I only
>>> participate
>>> to provide value if I can (this is a thread about Scylla and our users
>>> are
>>> on our mailing list).
>>>
>>> Scylla is all about what Cassandra is plus:
>>>  - Efficient hardware utilization (scale-up, performance)
>>>  - Low tail latency
>>>  - Auto/dynamic tuning (no JVM tuning, we tune the OS ourselves, we have
>>> cpu scheduler,
>>>    I/O userspace scheduler and more to come).
>>>  - SLA between compaction, repair, streaming and your r/w operations
>>>
>>> We started with a great foundation (C*) and wish to improve almost any
>>> aspect of it.
>>> Admittedly, we're way behind C* in terms of adoption. One need to start
>>> somewhere.
>>> However, users such as AppNexus run Scylla in production with 47
>>> physical nodes
>>> across 5 datacenters and their VP estimate that C* would have at least
>>> doubled the
>>> size. So this is equal for a 100-node C* cluster. Since we have the same
>>> gossip, murmur3 hash,
>>> CQL, nothing stops us to scale to 1,000 nodes. Another user (Mogujie)
>>> run 10s of TBs per node(!)
>>> in production.
>>>
>>> Also, since we try to compare Scylla and C* in a fair way, we invested a
>>> great deal of time
>>> to run C*. I can say it's not simple at all.
>>> Lastly, in a couple of months we'll reach parity in functionality with
>>> C* (counters are in 1.7 as experimental, in 1.8 counters will be stable and
>>> we'll have MV as experimental, LWT will be
>>> in the summer). We hope to collaborate with the C* community with the
>>> development of future
>>> features.
>>>
>>> Dor
>>>
>>>
>>> On Fri, Mar 10, 2017 at 10:19 AM, Jacques-Henri Berthemet <
>>> jacques-henri.berthemet@genesys.com> wrote:
>>>
>>>> Cassandra is not about pure performance, there are many other DBs that
>>>> are much faster than Cassandra. Cassandra strength is all about
>>>> scalability, performance increases in a linear way as you add more nodes.
>>>> During Cassandra summit 2014 Apple said they have a 10k node cluster. The
>>>> usual limiting factor is your disk write speed and latency, I don’t see how
>>>> C++ changes anything in this regard unless you can cache all your data in
>>>> memory.
>>>>
>>>>
>>>>
>>>> I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster
>>>> with PBs of data compared to Cassandra.
>>>>
>>>> *--*
>>>>
>>>> *Jacques-Henri Berthemet*
>>>>
>>>>
>>>>
>>>> *From:* Rakesh Kumar [mailto:rakeshkumar464@outlook.com]
>>>> *Sent:* vendredi 10 mars 2017 09:58
>>>>
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Re: scylladb
>>>>
>>>>
>>>>
>>>> Cassanda vs Scylla is a valid comparison because they both are
>>>> compatible.  Scylla is a drop-in replacement for Cassandra.
>>>> Is Aerospike a drop-in replacement for Cassandra? If yes, and only if
>>>> yes, then the comparison is valid with Scylla.
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> *From:* Bhuvan Rawal <bh...@gmail.com>
>>>> *To:* user@cassandra.apache.org
>>>> *Sent:* Friday, March 10, 2017 11:59 AM
>>>> *Subject:* Re: scylladb
>>>>
>>>>
>>>>
>>>> Agreed C++ gives an added advantage to talk to underlying hardware with
>>>> better efficiency, it sound good but can a pice of code written in C++ give
>>>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>>>> SEDA arch?
>>>>
>>>>
>>>>
>>>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>>>> written in C) claim to be 10X faster than Scylla here
>>>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>>>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>>>> performant than C* - I highly doubt that!! )
>>>>
>>>>
>>>>
>>>> For a moment lets forget about evaluating 2 different databases, one
>>>> can observe 10X performance difference between a mistuned cassandra cluster
>>>> and one thats tuned as per data model - there are so many Tunables in yaml
>>>> as well as table configs.
>>>>
>>>>
>>>>
>>>> Idea is - in order to strengthen your claim, you need to provide
>>>> complete system metrics (Disk, CPU, Network), the OPS increase starts to
>>>> decay along with the configs used. Having plain ops per second and 99p
>>>> latency is blackbox.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Bhuvan
>>>>
>>>>
>>>>
>>>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>>>>
>>>> ScyllaDB engineer here.
>>>>
>>>> C++ is really an enabling technology here. It is directly responsible
>>>> for a small fraction of the gain by executing faster than Java.  But it is
>>>> indirectly responsible for the gain by allowing us direct control over
>>>> memory and threading.  Just as an example, Scylla starts by taking over
>>>> almost all of the machine's memory, and dynamically assigning it to
>>>> memtables, cache, and working memory needed to handle requests in flight.
>>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>>> fully.  You can't do these things in Java.
>>>>
>>>> I would say the major contributors to Scylla performance are:
>>>>  - thread-per-core design
>>>>  - replacement of the page cache with a row cache
>>>>  - careful attention to many small details, each contributing a little,
>>>> but with a large overall impact
>>>>
>>>> While I'm here I can say that performance is not the only goal here, it
>>>> is stable and predictable performance over varying loads and during
>>>> maintenance operations like repair, without any special tuning.  We measure
>>>> the amount of CPU and I/O spent on foreground (user) and background
>>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>>> already makes operating Scylla a lot simpler.
>>>>
>>>>
>>>>
>>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>>
>>>> I dont think ScyllaDB performance is because of C++. The design
>>>> decisions in scylladb are indeed different from Cassandra such as getting
>>>> rid of SEDA and moving to TPC and so on.
>>>>
>>>>
>>>>
>>>> If someone thinks it is because of C++ then just show the benchmarks
>>>> that proves it is indeed the C++ which gave 10X performance boost as
>>>> ScyllaDB claims instead of stating it.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <
>>>> mrburton@gmail.com> wrote:
>>>>
>>>> They spend an enormous amount of time focusing on performance. You can
>>>> expect them to continue on with their optimization and keep crushing it.
>>>>
>>>>
>>>>
>>>> P.S., I don't work for ScyllaDB.
>>>>
>>>>
>>>>
>>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <
>>>> rakeshkumar464@outlook.com> wrote:
>>>>
>>>> In all of their presentation they keep harping on the fact that
>>>> scylladb is written in C++ and does not carry the overhead of Java.  Still
>>>> the difference looks staggering.
>>>> ______________________________ __________
>>>> From: daemeon reiydelle <da...@gmail.com>
>>>> Sent: Thursday, March 9, 2017 14:21
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: scylladb
>>>>
>>>> The comparison is fair, and conservative. Did substantial performance
>>>> comparisons for two clients, both results returned throughputs that were
>>>> faster than the published comparisons (15x as I recall). At that time the
>>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>>> for OLA compliance.
>>>>
>>>>
>>>> .......
>>>>
>>>> Daemeon C.M. Reiydelle
>>>> USA (+1) 415.501.0198 <+1%20415-501-0198>
>>>> London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>
>>>>
>>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:robin@us2
>>>> .nl <ro...@us2.nl>>> wrote:
>>>> I was wondering how people feel about the comparison that's made here
>>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>>> logy/ycsb-cassandra-scylla/#re sults-of-3-scylla-nodes-vs-30-
>>>> cassandra-nodes
>>>> <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>>>>
>>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>>> pros/cons known?
>>>>
>>>> Best regards,
>>>>
>>>> Robin Verlangen
>>>> Chief Data Architect
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>>> pythian.com <ro...@pythian.com>>> wrote:
>>>> No rain at all! But I almost had it running last weekend, but stopped
>>>> short of installing it. Let's see if this one is for real!
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>>>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>>>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><
>>>> tel:+351%20918%20918%20100 <+351%20918%20918%20100>> | Tel: +1 613 565
>>>> 8696 x1649 <+1%20613-565-8696><tel:+1%20613-565-8696
>>>> <+1%20613-565-8696>>
>>>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>>>
>>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
>>>> <da...@datastax.com>>> wrote:
>>>> You'll be the first Carlos.
>>>>
>>>> [Inline image 1]
>>>>
>>>> Had any rain lately? Curious how this went, if so.
>>>>
>>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>>> jack.krupansky@gmail.com<mail to:jack.krupansky@gmail.com>> wrote:
>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>> about actual use, so far.
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>>> <mailto:inf o@mrcalonso.com <in...@mrcalonso.com>>> wrote:
>>>> Any update about this?
>>>>
>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>>> alonso <https://twitter.com/calonso>>
>>>>
>>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>>> pythian.com <ro...@pythian.com>>> wrote:
>>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>>>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>>>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
>>>> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
>>>> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
>>>> <%2B1%20613%20565%208%20696%20x1649>>
>>>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>>>
>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
>>>> <da...@datastax.com>>> wrote:
>>>> As of two days ago, they say they've got it @cjrolo.
>>>>
>>>> https://github.com/scylladb/sc ylla/wiki/RELEASE-Scylla-0.11- Beta
>>>> <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>>>>
>>>>
>>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>>> pythian.com <ro...@pythian.com>>> wrote:
>>>> I will not try until multi-DC is implemented. More than an month has
>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>> take some time to test it.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>>>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>>>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
>>>> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
>>>> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
>>>> <%2B1%20613%20565%208%20696%20x1649>>
>>>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>>>
>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>>> wrote:
>>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>>> feedback.
>>>>
>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi guys,
>>>> >
>>>> > did anyone already try Scylladb (yet another fastest NoSQL database
>>>> in town) and has some thoughts/hands-on experience to share?
>>>> >
>>>> > Cheers,
>>>> > Tommaso
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> [datastax_logo.png]<http://www .datastax.com/
>>>> <http://www.datastax.com/>>
>>>>
>>>> DANI TRAPHAGEN
>>>>
>>>> Technical Enablement Lead | dani.traphagen@datastax.com<ma ilto:dani.traphagen@datastax.c
>>>> om <da...@datastax.com>>
>>>>
>>>> [twitter.png]<https://twitter. com/dtrapezoid
>>>> <https://twitter.com/dtrapezoid>> [linkedin.png] <https://www.linkedin.com/pub/
>>>> dani-traphagen/31/93b/b85
>>>> <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>  [https://lh5.googleusercontent
>>>> .com/WcFJcWZHKXnxu01V6zJIQapcG onoazqsv8O7_DtfhW-qbTRHxDjfX2o
>>>> wDNmQhgojRx5Y4mLEc-KiAeeTJjT0V mKiiIld8UP86AgQPJDK2o6oC6BhTmu
>>>> b4NLZ_MO9-E7l9Q
>>>> <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>>>> <https://github.com/dtrapezoid >
>>>>
>>>> [http://datastax.com/all/image s/cs_logo_color_sm.png
>>>> <http://datastax.com/all/images/cs_logo_color_sm.png>]
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> -Richard L. Burton III
>>>>
>>>> @rburton
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: scylladb

Posted by Dor Laor <do...@scylladb.com>.

On Fri, Mar 10, 2017 at 4:45 PM, Kant Kodali <ka...@peernova.com> wrote:

> http://performanceterracotta.blogspot.com/2012/09/numa-java.html
> http://docs.oracle.com/javase/7/docs/technotes/guides/vm/
> performance-enhancements-7.html
> http://openjdk.java.net/jeps/163
>
>
Java can exploit NUMA but it's not as a efficient as can be done in c++.
Andrea Arcangeli is the engineer behind Linux transparent huge pages(THP),
he
reported to me and the idea belongs to Avi. We did it for KVM's sake but
it was designed to any long running process like Cassandra.
However, the entire software stack should be aware. If you get a huge page
(2MB)
but keep in it only 1KB you waste lots of mem. On top of this, threads need
to
touch their data structures and they need to be well aligned, otherwise the
memory
page will bounce between the different cores.
With Cassandra it gets more complicated since there is a heap and off-heap
data.

Do programmers really track their data alignment? I doubt it.
Do users run C* with the JVM numa options and the right Linux THP options?
Again, I doubt.

Scylla on the other side is designed for NUMA. We have 2-level sharding.
The inner shards are transparent
to the user and are per-core (hyper thread). Such a shard access RAM only
within its numa node. Memory
is bonded to each thread/numa node. We have our own malloc allocator built
for this scheme.



> If scyllaDB has efficient Secondary indexes, LWT and MV's then that is
> something. I would be glad to see how they perform.
>
>
MV will be in 1.8, we haven't measured performance yet. We did measure our
counter implementation
and it looks promising (4X better throughput and 4X better latency on a
8-core machine).
The not-written yet LWT will kick-a** since our fully async engine is ideal
for the larger number
of round trips the LWT needs.

This is with the Linux tcp stack, once we'll use our dpdk one, performance
will improve further ;)



>
> On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor <do...@scylladb.com> wrote:
>
>> Scylla isn't just about performance too.
>>
>> First, a disclaimer, I am a Scylla co-founder. I respect open source a
>> lot,
>> so you guys are welcome to shush me out of this thread. I only participate
>> to provide value if I can (this is a thread about Scylla and our users are
>> on our mailing list).
>>
>> Scylla is all about what Cassandra is plus:
>>  - Efficient hardware utilization (scale-up, performance)
>>  - Low tail latency
>>  - Auto/dynamic tuning (no JVM tuning, we tune the OS ourselves, we have
>> cpu scheduler,
>>    I/O userspace scheduler and more to come).
>>  - SLA between compaction, repair, streaming and your r/w operations
>>
>> We started with a great foundation (C*) and wish to improve almost any
>> aspect of it.
>> Admittedly, we're way behind C* in terms of adoption. One need to start
>> somewhere.
>> However, users such as AppNexus run Scylla in production with 47 physical
>> nodes
>> across 5 datacenters and their VP estimate that C* would have at least
>> doubled the
>> size. So this is equal for a 100-node C* cluster. Since we have the same
>> gossip, murmur3 hash,
>> CQL, nothing stops us to scale to 1,000 nodes. Another user (Mogujie) run
>> 10s of TBs per node(!)
>> in production.
>>
>> Also, since we try to compare Scylla and C* in a fair way, we invested a
>> great deal of time
>> to run C*. I can say it's not simple at all.
>> Lastly, in a couple of months we'll reach parity in functionality with C*
>> (counters are in 1.7 as experimental, in 1.8 counters will be stable and
>> we'll have MV as experimental, LWT will be
>> in the summer). We hope to collaborate with the C* community with the
>> development of future
>> features.
>>
>> Dor
>>
>>
>> On Fri, Mar 10, 2017 at 10:19 AM, Jacques-Henri Berthemet <
>> jacques-henri.berthemet@genesys.com> wrote:
>>
>>> Cassandra is not about pure performance, there are many other DBs that
>>> are much faster than Cassandra. Cassandra strength is all about
>>> scalability, performance increases in a linear way as you add more nodes.
>>> During Cassandra summit 2014 Apple said they have a 10k node cluster. The
>>> usual limiting factor is your disk write speed and latency, I don’t see how
>>> C++ changes anything in this regard unless you can cache all your data in
>>> memory.
>>>
>>>
>>>
>>> I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster
>>> with PBs of data compared to Cassandra.
>>>
>>> *--*
>>>
>>> *Jacques-Henri Berthemet*
>>>
>>>
>>>
>>> *From:* Rakesh Kumar [mailto:rakeshkumar464@outlook.com]
>>> *Sent:* vendredi 10 mars 2017 09:58
>>>
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: scylladb
>>>
>>>
>>>
>>> Cassanda vs Scylla is a valid comparison because they both are
>>> compatible.  Scylla is a drop-in replacement for Cassandra.
>>> Is Aerospike a drop-in replacement for Cassandra? If yes, and only if
>>> yes, then the comparison is valid with Scylla.
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Bhuvan Rawal <bh...@gmail.com>
>>> *To:* user@cassandra.apache.org
>>> *Sent:* Friday, March 10, 2017 11:59 AM
>>> *Subject:* Re: scylladb
>>>
>>>
>>>
>>> Agreed C++ gives an added advantage to talk to underlying hardware with
>>> better efficiency, it sound good but can a pice of code written in C++ give
>>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>>> SEDA arch?
>>>
>>>
>>>
>>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>>> written in C) claim to be 10X faster than Scylla here
>>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>>> performant than C* - I highly doubt that!! )
>>>
>>>
>>>
>>> For a moment lets forget about evaluating 2 different databases, one can
>>> observe 10X performance difference between a mistuned cassandra cluster and
>>> one thats tuned as per data model - there are so many Tunables in yaml as
>>> well as table configs.
>>>
>>>
>>>
>>> Idea is - in order to strengthen your claim, you need to provide
>>> complete system metrics (Disk, CPU, Network), the OPS increase starts to
>>> decay along with the configs used. Having plain ops per second and 99p
>>> latency is blackbox.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Bhuvan
>>>
>>>
>>>
>>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>>>
>>> ScyllaDB engineer here.
>>>
>>> C++ is really an enabling technology here. It is directly responsible
>>> for a small fraction of the gain by executing faster than Java.  But it is
>>> indirectly responsible for the gain by allowing us direct control over
>>> memory and threading.  Just as an example, Scylla starts by taking over
>>> almost all of the machine's memory, and dynamically assigning it to
>>> memtables, cache, and working memory needed to handle requests in flight.
>>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>>> fully.  You can't do these things in Java.
>>>
>>> I would say the major contributors to Scylla performance are:
>>>  - thread-per-core design
>>>  - replacement of the page cache with a row cache
>>>  - careful attention to many small details, each contributing a little,
>>> but with a large overall impact
>>>
>>> While I'm here I can say that performance is not the only goal here, it
>>> is stable and predictable performance over varying loads and during
>>> maintenance operations like repair, without any special tuning.  We measure
>>> the amount of CPU and I/O spent on foreground (user) and background
>>> (maintenance) tasks and divide them fairly.  This work is not complete but
>>> already makes operating Scylla a lot simpler.
>>>
>>>
>>>
>>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>>
>>> I dont think ScyllaDB performance is because of C++. The design
>>> decisions in scylladb are indeed different from Cassandra such as getting
>>> rid of SEDA and moving to TPC and so on.
>>>
>>>
>>>
>>> If someone thinks it is because of C++ then just show the benchmarks
>>> that proves it is indeed the C++ which gave 10X performance boost as
>>> ScyllaDB claims instead of stating it.
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <
>>> mrburton@gmail.com> wrote:
>>>
>>> They spend an enormous amount of time focusing on performance. You can
>>> expect them to continue on with their optimization and keep crushing it.
>>>
>>>
>>>
>>> P.S., I don't work for ScyllaDB.
>>>
>>>
>>>
>>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
>>> wrote:
>>>
>>> In all of their presentation they keep harping on the fact that scylladb
>>> is written in C++ and does not carry the overhead of Java.  Still the
>>> difference looks staggering.
>>> ______________________________ __________
>>> From: daemeon reiydelle <da...@gmail.com>
>>> Sent: Thursday, March 9, 2017 14:21
>>> To: user@cassandra.apache.org
>>> Subject: Re: scylladb
>>>
>>> The comparison is fair, and conservative. Did substantial performance
>>> comparisons for two clients, both results returned throughputs that were
>>> faster than the published comparisons (15x as I recall). At that time the
>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>> for OLA compliance.
>>>
>>>
>>> .......
>>>
>>> Daemeon C.M. Reiydelle
>>> USA (+1) 415.501.0198 <+1%20415-501-0198>
>>> London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>
>>>
>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:robin@us2
>>> .nl <ro...@us2.nl>>> wrote:
>>> I was wondering how people feel about the comparison that's made here
>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>> logy/ycsb-cassandra-scylla/#re sults-of-3-scylla-nodes-vs-30-
>>> cassandra-nodes
>>> <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>>>
>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>> pros/cons known?
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> Chief Data Architect
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>> pythian.com <ro...@pythian.com>>> wrote:
>>> No rain at all! But I almost had it running last weekend, but stopped
>>> short of installing it. Let's see if this one is for real!
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><
>>> tel:+351%20918%20918%20100 <+351%20918%20918%20100>> | Tel: +1 613 565
>>> 8696 x1649 <+1%20613-565-8696><tel:+1%20613-565-8696 <+1%20613-565-8696>
>>> >
>>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>>
>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
>>> <da...@datastax.com>>> wrote:
>>> You'll be the first Carlos.
>>>
>>> [Inline image 1]
>>>
>>> Had any rain lately? Curious how this went, if so.
>>>
>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>> jack.krupansky@gmail.com<mail to:jack.krupansky@gmail.com>> wrote:
>>> I just did a Twitter search on scylladb and did not see any tweets about
>>> actual use, so far.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>> <mailto:inf o@mrcalonso.com <in...@mrcalonso.com>>> wrote:
>>> Any update about this?
>>>
>>> @Carlos Rolo, did you tried it? Thoughts?
>>>
>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c
>>> alonso <https://twitter.com/calonso>>
>>>
>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>> pythian.com <ro...@pythian.com>>> wrote:
>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
>>> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
>>> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
>>> <%2B1%20613%20565%208%20696%20x1649>>
>>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>>
>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
>>> <da...@datastax.com>>> wrote:
>>> As of two days ago, they say they've got it @cjrolo.
>>>
>>> https://github.com/scylladb/sc ylla/wiki/RELEASE-Scylla-0.11- Beta
>>> <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>>>
>>>
>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>> pythian.com <ro...@pythian.com>>> wrote:
>>> I will not try until multi-DC is implemented. More than an month has
>>> passed since I looked for it, so it could possibly be in place, if so I may
>>> take some time to test it.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
>>> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
>>> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
>>> <%2B1%20613%20565%208%20696%20x1649>>
>>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>>
>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>> wrote:
>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>> feedback.
>>>
>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>> wrote:
>>> >
>>> > Hi guys,
>>> >
>>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>>> town) and has some thoughts/hands-on experience to share?
>>> >
>>> > Cheers,
>>> > Tommaso
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> --
>>> Sent from mobile -- apologizes for brevity or errors.
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> [datastax_logo.png]<http://www .datastax.com/ <http://www.datastax.com/>
>>> >
>>>
>>> DANI TRAPHAGEN
>>>
>>> Technical Enablement Lead | dani.traphagen@datastax.com<ma ilto:dani.traphagen@datastax.c
>>> om <da...@datastax.com>>
>>>
>>> [twitter.png]<https://twitter. com/dtrapezoid
>>> <https://twitter.com/dtrapezoid>> [linkedin.png] <https://www.linkedin.com/pub/
>>> dani-traphagen/31/93b/b85
>>> <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>  [https://lh5.googleusercontent
>>> .com/WcFJcWZHKXnxu01V6zJIQapcG onoazqsv8O7_DtfhW-qbTRHxDjfX2o
>>> wDNmQhgojRx5Y4mLEc-KiAeeTJjT0V mKiiIld8UP86AgQPJDK2o6oC6BhTmu
>>> b4NLZ_MO9-E7l9Q
>>> <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>>> <https://github.com/dtrapezoid >
>>>
>>> [http://datastax.com/all/image s/cs_logo_color_sm.png
>>> <http://datastax.com/all/images/cs_logo_color_sm.png>]
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> -Richard L. Burton III
>>>
>>> @rburton
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

http://performanceterracotta.blogspot.com/2012/09/numa-java.html
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html
http://openjdk.java.net/jeps/163

If scyllaDB has efficient Secondary indexes, LWT and MV's then that is
something. I would be glad to see how they perform.


On Fri, Mar 10, 2017 at 10:45 AM, Dor Laor <do...@scylladb.com> wrote:

> Scylla isn't just about performance too.
>
> First, a disclaimer, I am a Scylla co-founder. I respect open source a lot,
> so you guys are welcome to shush me out of this thread. I only participate
> to provide value if I can (this is a thread about Scylla and our users are
> on our mailing list).
>
> Scylla is all about what Cassandra is plus:
>  - Efficient hardware utilization (scale-up, performance)
>  - Low tail latency
>  - Auto/dynamic tuning (no JVM tuning, we tune the OS ourselves, we have
> cpu scheduler,
>    I/O userspace scheduler and more to come).
>  - SLA between compaction, repair, streaming and your r/w operations
>
> We started with a great foundation (C*) and wish to improve almost any
> aspect of it.
> Admittedly, we're way behind C* in terms of adoption. One need to start
> somewhere.
> However, users such as AppNexus run Scylla in production with 47 physical
> nodes
> across 5 datacenters and their VP estimate that C* would have at least
> doubled the
> size. So this is equal for a 100-node C* cluster. Since we have the same
> gossip, murmur3 hash,
> CQL, nothing stops us to scale to 1,000 nodes. Another user (Mogujie) run
> 10s of TBs per node(!)
> in production.
>
> Also, since we try to compare Scylla and C* in a fair way, we invested a
> great deal of time
> to run C*. I can say it's not simple at all.
> Lastly, in a couple of months we'll reach parity in functionality with C*
> (counters are in 1.7 as experimental, in 1.8 counters will be stable and
> we'll have MV as experimental, LWT will be
> in the summer). We hope to collaborate with the C* community with the
> development of future
> features.
>
> Dor
>
>
> On Fri, Mar 10, 2017 at 10:19 AM, Jacques-Henri Berthemet <
> jacques-henri.berthemet@genesys.com> wrote:
>
>> Cassandra is not about pure performance, there are many other DBs that
>> are much faster than Cassandra. Cassandra strength is all about
>> scalability, performance increases in a linear way as you add more nodes.
>> During Cassandra summit 2014 Apple said they have a 10k node cluster. The
>> usual limiting factor is your disk write speed and latency, I don’t see how
>> C++ changes anything in this regard unless you can cache all your data in
>> memory.
>>
>>
>>
>> I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster
>> with PBs of data compared to Cassandra.
>>
>> *--*
>>
>> *Jacques-Henri Berthemet*
>>
>>
>>
>> *From:* Rakesh Kumar [mailto:rakeshkumar464@outlook.com]
>> *Sent:* vendredi 10 mars 2017 09:58
>>
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: scylladb
>>
>>
>>
>> Cassanda vs Scylla is a valid comparison because they both are
>> compatible.  Scylla is a drop-in replacement for Cassandra.
>> Is Aerospike a drop-in replacement for Cassandra? If yes, and only if
>> yes, then the comparison is valid with Scylla.
>>
>>
>> ------------------------------
>>
>> *From:* Bhuvan Rawal <bh...@gmail.com>
>> *To:* user@cassandra.apache.org
>> *Sent:* Friday, March 10, 2017 11:59 AM
>> *Subject:* Re: scylladb
>>
>>
>>
>> Agreed C++ gives an added advantage to talk to underlying hardware with
>> better efficiency, it sound good but can a pice of code written in C++ give
>> 1000% throughput than a Java app? Is TPC design 10X more performant than
>> SEDA arch?
>>
>>
>>
>> And if C/C++ is indeed that fast how can Aerospike (which is itself
>> written in C) claim to be 10X faster than Scylla here
>> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining
>> your's and aerospike's benchmarks it appears that Aerospike is 100X
>> performant than C* - I highly doubt that!! )
>>
>>
>>
>> For a moment lets forget about evaluating 2 different databases, one can
>> observe 10X performance difference between a mistuned cassandra cluster and
>> one thats tuned as per data model - there are so many Tunables in yaml as
>> well as table configs.
>>
>>
>>
>> Idea is - in order to strengthen your claim, you need to provide complete
>> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
>> with the configs used. Having plain ops per second and 99p latency is
>> blackbox.
>>
>>
>>
>> Regards,
>>
>> Bhuvan
>>
>>
>>
>> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>>
>> ScyllaDB engineer here.
>>
>> C++ is really an enabling technology here. It is directly responsible for
>> a small fraction of the gain by executing faster than Java.  But it is
>> indirectly responsible for the gain by allowing us direct control over
>> memory and threading.  Just as an example, Scylla starts by taking over
>> almost all of the machine's memory, and dynamically assigning it to
>> memtables, cache, and working memory needed to handle requests in flight.
>> Memory is statically partitioned across cores, allowing us to exploit NUMA
>> fully.  You can't do these things in Java.
>>
>> I would say the major contributors to Scylla performance are:
>>  - thread-per-core design
>>  - replacement of the page cache with a row cache
>>  - careful attention to many small details, each contributing a little,
>> but with a large overall impact
>>
>> While I'm here I can say that performance is not the only goal here, it
>> is stable and predictable performance over varying loads and during
>> maintenance operations like repair, without any special tuning.  We measure
>> the amount of CPU and I/O spent on foreground (user) and background
>> (maintenance) tasks and divide them fairly.  This work is not complete but
>> already makes operating Scylla a lot simpler.
>>
>>
>>
>> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>>
>> I dont think ScyllaDB performance is because of C++. The design decisions
>> in scylladb are indeed different from Cassandra such as getting rid of SEDA
>> and moving to TPC and so on.
>>
>>
>>
>> If someone thinks it is because of C++ then just show the benchmarks that
>> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
>> claims instead of stating it.
>>
>>
>>
>>
>>
>> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>
>> wrote:
>>
>> They spend an enormous amount of time focusing on performance. You can
>> expect them to continue on with their optimization and keep crushing it.
>>
>>
>>
>> P.S., I don't work for ScyllaDB.
>>
>>
>>
>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
>> wrote:
>>
>> In all of their presentation they keep harping on the fact that scylladb
>> is written in C++ and does not carry the overhead of Java.  Still the
>> difference looks staggering.
>> ______________________________ __________
>> From: daemeon reiydelle <da...@gmail.com>
>> Sent: Thursday, March 9, 2017 14:21
>> To: user@cassandra.apache.org
>> Subject: Re: scylladb
>>
>> The comparison is fair, and conservative. Did substantial performance
>> comparisons for two clients, both results returned throughputs that were
>> faster than the published comparisons (15x as I recall). At that time the
>> client preferred to utilize a Cass COTS solution and use a caching solution
>> for OLA compliance.
>>
>>
>> .......
>>
>> Daemeon C.M. Reiydelle
>> USA (+1) 415.501.0198 <+1%20415-501-0198>
>> London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>
>>
>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:robin@us2
>> .nl <ro...@us2.nl>>> wrote:
>> I was wondering how people feel about the comparison that's made here
>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>> logy/ycsb-cassandra-scylla/#re sults-of-3-scylla-nodes-vs-30-
>> cassandra-nodes
>> <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>>
>> They are claiming a 10x improvement, is that a fair comparison or maybe a
>> somewhat coloured view of a (micro)benchmark in a specific setup? Any
>> pros/cons known?
>>
>> Best regards,
>>
>> Robin Verlangen
>> Chief Data Architect
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>> pythian.com <ro...@pythian.com>>> wrote:
>> No rain at all! But I almost had it running last weekend, but stopped
>> short of installing it. Let's see if this one is for real!
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><
>> tel:+351%20918%20918%20100 <+351%20918%20918%20100>> | Tel: +1 613 565
>> 8696 x1649 <+1%20613-565-8696><tel:+1%20613-565-8696 <+1%20613-565-8696>>
>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>
>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
>> <da...@datastax.com>>> wrote:
>> You'll be the first Carlos.
>>
>> [Inline image 1]
>>
>> Had any rain lately? Curious how this went, if so.
>>
>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <jack.krupansky@gmail.com<mail
>> to:jack.krupansky@gmail.com>> wrote:
>> I just did a Twitter search on scylladb and did not see any tweets about
>> actual use, so far.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>> <mailto:inf o@mrcalonso.com <in...@mrcalonso.com>>> wrote:
>> Any update about this?
>>
>> @Carlos Rolo, did you tried it? Thoughts?
>>
>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c alonso
>> <https://twitter.com/calonso>>
>>
>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>> pythian.com <ro...@pythian.com>>> wrote:
>> Something to do on a expected rainy weekend. Thanks for the information.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
>> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
>> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
>> <%2B1%20613%20565%208%20696%20x1649>>
>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>
>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
>> <da...@datastax.com>>> wrote:
>> As of two days ago, they say they've got it @cjrolo.
>>
>> https://github.com/scylladb/sc ylla/wiki/RELEASE-Scylla-0.11- Beta
>> <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>>
>>
>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>> pythian.com <ro...@pythian.com>>> wrote:
>> I will not try until multi-DC is implemented. More than an month has
>> passed since I looked for it, so it could possibly be in place, if so I may
>> take some time to test it.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
>> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
>> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
>> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
>> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
>> <%2B1%20613%20565%208%20696%20x1649>>
>> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>>
>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>> wrote:
>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>> feedback.
>>
>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>> wrote:
>> >
>> > Hi guys,
>> >
>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>> town) and has some thoughts/hands-on experience to share?
>> >
>> > Cheers,
>> > Tommaso
>>
>>
>>
>>
>> --
>>
>>
>>
>>
>> --
>> Sent from mobile -- apologizes for brevity or errors.
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>> --
>> [datastax_logo.png]<http://www .datastax.com/ <http://www.datastax.com/>>
>>
>> DANI TRAPHAGEN
>>
>> Technical Enablement Lead | dani.traphagen@datastax.com<ma ilto:dani.traphagen@datastax.c
>> om <da...@datastax.com>>
>>
>> [twitter.png]<https://twitter. com/dtrapezoid
>> <https://twitter.com/dtrapezoid>> [linkedin.png] <https://www.linkedin.com/pub/
>> dani-traphagen/31/93b/b85
>> <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>  [https://lh5.googleusercontent
>> .com/WcFJcWZHKXnxu01V6zJIQapcG onoazqsv8O7_DtfhW-qbTRHxDjfX2o
>> wDNmQhgojRx5Y4mLEc-KiAeeTJjT0V mKiiIld8UP86AgQPJDK2o6oC6BhTmu
>> b4NLZ_MO9-E7l9Q
>> <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>> <https://github.com/dtrapezoid >
>>
>> [http://datastax.com/all/image s/cs_logo_color_sm.png
>> <http://datastax.com/all/images/cs_logo_color_sm.png>]
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> -Richard L. Burton III
>>
>> @rburton
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: scylladb

Posted by Dor Laor <do...@scylladb.com>.

Scylla isn't just about performance too.

First, a disclaimer, I am a Scylla co-founder. I respect open source a lot,
so you guys are welcome to shush me out of this thread. I only participate
to provide value if I can (this is a thread about Scylla and our users are
on our mailing list).

Scylla is all about what Cassandra is plus:
 - Efficient hardware utilization (scale-up, performance)
 - Low tail latency
 - Auto/dynamic tuning (no JVM tuning, we tune the OS ourselves, we have
cpu scheduler,
   I/O userspace scheduler and more to come).
 - SLA between compaction, repair, streaming and your r/w operations

We started with a great foundation (C*) and wish to improve almost any
aspect of it.
Admittedly, we're way behind C* in terms of adoption. One need to start
somewhere.
However, users such as AppNexus run Scylla in production with 47 physical
nodes
across 5 datacenters and their VP estimate that C* would have at least
doubled the
size. So this is equal for a 100-node C* cluster. Since we have the same
gossip, murmur3 hash,
CQL, nothing stops us to scale to 1,000 nodes. Another user (Mogujie) run
10s of TBs per node(!)
in production.

Also, since we try to compare Scylla and C* in a fair way, we invested a
great deal of time
to run C*. I can say it's not simple at all.
Lastly, in a couple of months we'll reach parity in functionality with C*
(counters are in 1.7 as experimental, in 1.8 counters will be stable and
we'll have MV as experimental, LWT will be
in the summer). We hope to collaborate with the C* community with the
development of future
features.

Dor


On Fri, Mar 10, 2017 at 10:19 AM, Jacques-Henri Berthemet <
jacques-henri.berthemet@genesys.com> wrote:

> Cassandra is not about pure performance, there are many other DBs that are
> much faster than Cassandra. Cassandra strength is all about scalability,
> performance increases in a linear way as you add more nodes. During
> Cassandra summit 2014 Apple said they have a 10k node cluster. The usual
> limiting factor is your disk write speed and latency, I don’t see how C++
> changes anything in this regard unless you can cache all your data in
> memory.
>
>
>
> I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster
> with PBs of data compared to Cassandra.
>
> *--*
>
> *Jacques-Henri Berthemet*
>
>
>
> *From:* Rakesh Kumar [mailto:rakeshkumar464@outlook.com]
> *Sent:* vendredi 10 mars 2017 09:58
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: scylladb
>
>
>
> Cassanda vs Scylla is a valid comparison because they both are
> compatible.  Scylla is a drop-in replacement for Cassandra.
> Is Aerospike a drop-in replacement for Cassandra? If yes, and only if yes,
> then the comparison is valid with Scylla.
>
>
> ------------------------------
>
> *From:* Bhuvan Rawal <bh...@gmail.com>
> *To:* user@cassandra.apache.org
> *Sent:* Friday, March 10, 2017 11:59 AM
> *Subject:* Re: scylladb
>
>
>
> Agreed C++ gives an added advantage to talk to underlying hardware with
> better efficiency, it sound good but can a pice of code written in C++ give
> 1000% throughput than a Java app? Is TPC design 10X more performant than
> SEDA arch?
>
>
>
> And if C/C++ is indeed that fast how can Aerospike (which is itself
> written in C) claim to be 10X faster than Scylla here
> http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
> and aerospike's benchmarks it appears that Aerospike is 100X performant
> than C* - I highly doubt that!! )
>
>
>
> For a moment lets forget about evaluating 2 different databases, one can
> observe 10X performance difference between a mistuned cassandra cluster and
> one thats tuned as per data model - there are so many Tunables in yaml as
> well as table configs.
>
>
>
> Idea is - in order to strengthen your claim, you need to provide complete
> system metrics (Disk, CPU, Network), the OPS increase starts to decay along
> with the configs used. Having plain ops per second and 99p latency is
> blackbox.
>
>
>
> Regards,
>
> Bhuvan
>
>
>
> On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:
>
> ScyllaDB engineer here.
>
> C++ is really an enabling technology here. It is directly responsible for
> a small fraction of the gain by executing faster than Java.  But it is
> indirectly responsible for the gain by allowing us direct control over
> memory and threading.  Just as an example, Scylla starts by taking over
> almost all of the machine's memory, and dynamically assigning it to
> memtables, cache, and working memory needed to handle requests in flight.
> Memory is statically partitioned across cores, allowing us to exploit NUMA
> fully.  You can't do these things in Java.
>
> I would say the major contributors to Scylla performance are:
>  - thread-per-core design
>  - replacement of the page cache with a row cache
>  - careful attention to many small details, each contributing a little,
> but with a large overall impact
>
> While I'm here I can say that performance is not the only goal here, it is
> stable and predictable performance over varying loads and during
> maintenance operations like repair, without any special tuning.  We measure
> the amount of CPU and I/O spent on foreground (user) and background
> (maintenance) tasks and divide them fairly.  This work is not complete but
> already makes operating Scylla a lot simpler.
>
>
>
> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>
> I dont think ScyllaDB performance is because of C++. The design decisions
> in scylladb are indeed different from Cassandra such as getting rid of SEDA
> and moving to TPC and so on.
>
>
>
> If someone thinks it is because of C++ then just show the benchmarks that
> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
> claims instead of stating it.
>
>
>
>
>
> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>
> wrote:
>
> They spend an enormous amount of time focusing on performance. You can
> expect them to continue on with their optimization and keep crushing it.
>
>
>
> P.S., I don't work for ScyllaDB.
>
>
>
> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
> wrote:
>
> In all of their presentation they keep harping on the fact that scylladb
> is written in C++ and does not carry the overhead of Java.  Still the
> difference looks staggering.
> ______________________________ __________
> From: daemeon reiydelle <da...@gmail.com>
> Sent: Thursday, March 9, 2017 14:21
> To: user@cassandra.apache.org
> Subject: Re: scylladb
>
> The comparison is fair, and conservative. Did substantial performance
> comparisons for two clients, both results returned throughputs that were
> faster than the published comparisons (15x as I recall). At that time the
> client preferred to utilize a Cass COTS solution and use a caching solution
> for OLA compliance.
>
>
> .......
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198 <+1%20415-501-0198>
> London (+44) (0) 20 8144 9872 <+44%2020%208144%209872>
>
> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:robin@us2
> .nl <ro...@us2.nl>>> wrote:
> I was wondering how people feel about the comparison that's made here
> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
> logy/ycsb-cassandra-scylla/#re sults-of-3-scylla-nodes-vs-30-
> cassandra-nodes
> <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>
> They are claiming a 10x improvement, is that a fair comparison or maybe a
> somewhat coloured view of a (micro)benchmark in a specific setup? Any
> pros/cons known?
>
> Best regards,
>
> Robin Verlangen
> Chief Data Architect
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:rolo@
> pythian.com <ro...@pythian.com>>> wrote:
> No rain at all! But I almost had it running last weekend, but stopped
> short of installing it. Let's see if this one is for real!
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><
> tel:+351%20918%20918%20100 <+351%20918%20918%20100>> | Tel: +1 613 565
> 8696 x1649 <+1%20613-565-8696><tel:+1%20613-565-8696 <+1%20613-565-8696>>
> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>
> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
> <da...@datastax.com>>> wrote:
> You'll be the first Carlos.
>
> [Inline image 1]
>
> Had any rain lately? Curious how this went, if so.
>
> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <jack.krupansky@gmail.com<mail
> to:jack.krupansky@gmail.com>> wrote:
> I just did a Twitter search on scylladb and did not see any tweets about
> actual use, so far.
>
>
> -- Jack Krupansky
>
> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
> <mailto:inf o@mrcalonso.com <in...@mrcalonso.com>>> wrote:
> Any update about this?
>
> @Carlos Rolo, did you tried it? Thoughts?
>
> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c alonso
> <https://twitter.com/calonso>>
>
> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
> pythian.com <ro...@pythian.com>>> wrote:
> Something to do on a expected rainy weekend. Thanks for the information.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
> <%2B1%20613%20565%208%20696%20x1649>>
> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>
> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
> dani.traphagen@datastax.com<m ailto:dani.traphagen@datastax. com
> <da...@datastax.com>>> wrote:
> As of two days ago, they say they've got it @cjrolo.
>
> https://github.com/scylladb/sc ylla/wiki/RELEASE-Scylla-0.11- Beta
> <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>
>
> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:rolo@
> pythian.com <ro...@pythian.com>>> wrote:
> I will not try until multi-DC is implemented. More than an month has
> passed since I looked for it, so it could possibly be in place, if so I may
> take some time to test it.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter
> olo <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car
> losjuzarterolo <http://linkedin.com/in/carlosjuzarterolo>>
> Mobile: +351 91 891 81 00 <+351%20918%20918%20100><tel:%2B351%2091%20891%2081%
> 2000 <%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649
> <+1%20613-565-8696><tel:%2B1%20613%20565%208 696%20x1649
> <%2B1%20613%20565%208%20696%20x1649>>
> www.pythian.com<http://www.pyt hian.com/ <http://www.pythian.com/>>
>
> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
> wrote:
> Nope, no one I know.  Let me know if you try it I'd love to hear your
> feedback.
>
> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
> wrote:
> >
> > Hi guys,
> >
> > did anyone already try Scylladb (yet another fastest NoSQL database in
> town) and has some thoughts/hands-on experience to share?
> >
> > Cheers,
> > Tommaso
>
>
>
>
> --
>
>
>
>
> --
> Sent from mobile -- apologizes for brevity or errors.
>
>
>
> --
>
>
>
>
>
>
>
> --
> [datastax_logo.png]<http://www .datastax.com/ <http://www.datastax.com/>>
>
> DANI TRAPHAGEN
>
> Technical Enablement Lead | dani.traphagen@datastax.com<ma ilto:dani.traphagen@datastax.c
> om <da...@datastax.com>>
>
> [twitter.png]<https://twitter. com/dtrapezoid
> <https://twitter.com/dtrapezoid>> [linkedin.png] <https://www.linkedin.com/pub/
> dani-traphagen/31/93b/b85
> <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>  [https://lh5.googleusercontent
> .com/WcFJcWZHKXnxu01V6zJIQapcG onoazqsv8O7_DtfhW-qbTRHxDjfX2o
> wDNmQhgojRx5Y4mLEc-KiAeeTJjT0V mKiiIld8UP86AgQPJDK2o6oC6BhTmu
> b4NLZ_MO9-E7l9Q
> <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
> <https://github.com/dtrapezoid >
>
> [http://datastax.com/all/image s/cs_logo_color_sm.png
> <http://datastax.com/all/images/cs_logo_color_sm.png>]
>
>
>
> --
>
>
>
>
>
>
>
> --
>
> -Richard L. Burton III
>
> @rburton
>
>
>
>
>
>
>
>
>

RE: scylladb

Posted by Jacques-Henri Berthemet <ja...@genesys.com>.

Cassandra is not about pure performance, there are many other DBs that are much faster than Cassandra. Cassandra strength is all about scalability, performance increases in a linear way as you add more nodes. During Cassandra summit 2014 Apple said they have a 10k node cluster. The usual limiting factor is your disk write speed and latency, I don’t see how C++ changes anything in this regard unless you can cache all your data in memory.

I’d be curious to know how ScyllaDB performs with a 100+ nodes cluster with PBs of data compared to Cassandra.
--
Jacques-Henri Berthemet

From: Rakesh Kumar [mailto:rakeshkumar464@outlook.com]
Sent: vendredi 10 mars 2017 09:58
To: user@cassandra.apache.org
Subject: Re: scylladb

Cassanda vs Scylla is a valid comparison because they both are compatible.  Scylla is a drop-in replacement for Cassandra.
Is Aerospike a drop-in replacement for Cassandra? If yes, and only if yes, then the comparison is valid with Scylla.

________________________________
From: Bhuvan Rawal <bh...@gmail.com>>
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Sent: Friday, March 10, 2017 11:59 AM
Subject: Re: scylladb

Agreed C++ gives an added advantage to talk to underlying hardware with better efficiency, it sound good but can a pice of code written in C++ give 1000% throughput than a Java app? Is TPC design 10X more performant than SEDA arch?

And if C/C++ is indeed that fast how can Aerospike (which is itself written in C) claim to be 10X faster than Scylla here http://www.aerospike.com/benchmarks/scylladb-initial/<http://www.aerospike.com/benchmarks/scylladb-initial/> ? (Combining your's and aerospike's benchmarks it appears that Aerospike is 100X performant than C* - I highly doubt that!! )

For a moment lets forget about evaluating 2 different databases, one can observe 10X performance difference between a mistuned cassandra cluster and one thats tuned as per data model - there are so many Tunables in yaml as well as table configs.

Idea is - in order to strengthen your claim, you need to provide complete system metrics (Disk, CPU, Network), the OPS increase starts to decay along with the configs used. Having plain ops per second and 99p latency is blackbox.

Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com>> wrote:
ScyllaDB engineer here.

C++ is really an enabling technology here. It is directly responsible for a small fraction of the gain by executing faster than Java.  But it is indirectly responsible for the gain by allowing us direct control over memory and threading.  Just as an example, Scylla starts by taking over almost all of the machine's memory, and dynamically assigning it to memtables, cache, and working memory needed to handle requests in flight.  Memory is statically partitioned across cores, allowing us to exploit NUMA fully.  You can't do these things in Java.

I would say the major contributors to Scylla performance are:
 - thread-per-core design
 - replacement of the page cache with a row cache
 - careful attention to many small details, each contributing a little, but with a large overall impact

While I'm here I can say that performance is not the only goal here, it is stable and predictable performance over varying loads and during maintenance operations like repair, without any special tuning.  We measure the amount of CPU and I/O spent on foreground (user) and background (maintenance) tasks and divide them fairly.  This work is not complete but already makes operating Scylla a lot simpler.

On 03/10/2017 01:42 AM, Kant Kodali wrote:
I dont think ScyllaDB performance is because of C++. The design decisions in scylladb are indeed different from Cassandra such as getting rid of SEDA and moving to TPC and so on.

If someone thinks it is because of C++ then just show the benchmarks that proves it is indeed the C++ which gave 10X performance boost as ScyllaDB claims instead of stating it.

On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>> wrote:
They spend an enormous amount of time focusing on performance. You can expect them to continue on with their optimization and keep crushing it.

P.S., I don't work for ScyllaDB.

On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>> wrote:
In all of their presentation they keep harping on the fact that scylladb is written in C++ and does not carry the overhead of Java.  Still the difference looks staggering.
______________________________ __________
From: daemeon reiydelle <da...@gmail.com>>
Sent: Thursday, March 9, 2017 14:21
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: scylladb

The comparison is fair, and conservative. Did substantial performance comparisons for two clients, both results returned throughputs that were faster than the published comparisons (15x as I recall). At that time the client preferred to utilize a Cass COTS solution and use a caching solution for OLA compliance.

.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <ro...@us2.nl><mailto:robin@us2 .nl<ma...@us2.nl>>> wrote:
I was wondering how people feel about the comparison that's made here between Cassandra and ScyllaDB : http://www.scylladb.com/techno logy/ycsb-cassandra-scylla/#re sults-of-3-scylla-nodes-vs-30- cassandra-nodes<http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>

They are claiming a 10x improvement, is that a fair comparison or maybe a somewhat coloured view of a (micro)benchmark in a specific setup? Any pros/cons known?

Best regards,

Robin Verlangen
Chief Data Architect

Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com><mailto:rolo@ pythian.com<ma...@pythian.com>>> wrote:
No rain at all! But I almost had it running last weekend, but stopped short of installing it. Let's see if this one is for real!

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter olo<http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car losjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>>
Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613 565 8696 x1649<tel:+1%20613-565-8696>
www.pythian.com<http://www.pythian.com/><http://www.pyt hian.com/<http://www.pythian.com/>>

On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <da...@datastax.com><m ailto:dani.traphagen@datastax. com<ma...@datastax.com>>> wrote:
You'll be the first Carlos.

[Inline image 1]

Had any rain lately? Curious how this went, if so.

On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <ja...@gmail.com><mail to:jack.krupansky@gmail.com<ma...@gmail.com>>> wrote:
I just did a Twitter search on scylladb and did not see any tweets about actual use, so far.

-- Jack Krupansky

On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com><mailto:inf o@mrcalonso.com<ma...@mrcalonso.com>>> wrote:
Any update about this?

@Carlos Rolo, did you tried it? Thoughts?

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c alonso<https://twitter.com/calonso>>

On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com><mailto:rolo@ pythian.com<ma...@pythian.com>>> wrote:
Something to do on a expected rainy weekend. Thanks for the information.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter olo<http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car losjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>>
Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081% 2000<tel:%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208 696%20x1649<tel:%2B1%20613%20565%208%20696%20x1649>>
www.pythian.com<http://www.pythian.com/><http://www.pyt hian.com/<http://www.pythian.com/>>

On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <da...@datastax.com><m ailto:dani.traphagen@datastax. com<ma...@datastax.com>>> wrote:
As of two days ago, they say they've got it @cjrolo.

https://github.com/scylladb/sc ylla/wiki/RELEASE-Scylla-0.11- Beta<https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>

On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com><mailto:rolo@ pythian.com<ma...@pythian.com>>> wrote:
I will not try until multi-DC is implemented. More than an month has passed since I looked for it, so it could possibly be in place, if so I may take some time to test it.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter olo<http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car losjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>>
Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081% 2000<tel:%2B351%2091%20891%2081%25%202000>> | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208 696%20x1649<tel:%2B1%20613%20565%208%20696%20x1649>>
www.pythian.com<http://www.pythian.com/><http://www.pyt hian.com/<http://www.pythian.com/>>

On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>> wrote:
Nope, no one I know.  Let me know if you try it I'd love to hear your feedback.

> On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>> wrote:
>
> Hi guys,
>
> did anyone already try Scylladb (yet another fastest NoSQL database in town) and has some thoughts/hands-on experience to share?
>
> Cheers,
> Tommaso

--

--
Sent from mobile -- apologizes for brevity or errors.

--

--
[datastax_logo.png]<http://www .datastax.com/<http://www.datastax.com/>>

DANI TRAPHAGEN

Technical Enablement Lead | dani.traphagen@datastax.com<ma...@datastax.com><ma ilto:dani.traphagen@datastax.c om<ma...@datastax.com>>

[twitter.png]<https://twitter. com/dtrapezoid<https://twitter.com/dtrapezoid>> [linkedin.png] <https://www.linkedin.com/pub/ dani-traphagen/31/93b/b85<https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>  [https://lh5.googleusercontent .com/WcFJcWZHKXnxu01V6zJIQapcG onoazqsv8O7_DtfhW-qbTRHxDjfX2o wDNmQhgojRx5Y4mLEc-KiAeeTJjT0V mKiiIld8UP86AgQPJDK2o6oC6BhTmu b4NLZ_MO9-E7l9Q<https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>] <https://github.com/dtrapezoid<https://github.com/dtrapezoid> >

[http://datastax.com/all/image s/cs_logo_color_sm.png<http://datastax.com/all/images/cs_logo_color_sm.png>]

--

--
-Richard L. Burton III
@rburton

Re: scylladb

Posted by Rakesh Kumar <ra...@outlook.com>.

Cassanda vs Scylla is a valid comparison because they both are compatible.  Scylla is a drop-in replacement for Cassandra.
Is Aerospike a drop-in replacement for Cassandra? If yes, and only if yes, then the comparison is valid with Scylla.

________________________________
From: Bhuvan Rawal <bh...@gmail.com>
To: user@cassandra.apache.org
Sent: Friday, March 10, 2017 11:59 AM
Subject: Re: scylladb

Agreed C++ gives an added advantage to talk to underlying hardware with better efficiency, it sound good but can a pice of code written in C++ give 1000% throughput than a Java app? Is TPC design 10X more performant than SEDA arch?

And if C/C++ is indeed that fast how can Aerospike (which is itself written in C) claim to be 10X faster than Scylla here http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's and aerospike's benchmarks it appears that Aerospike is 100X performant than C* - I highly doubt that!! )

For a moment lets forget about evaluating 2 different databases, one can observe 10X performance difference between a mistuned cassandra cluster and one thats tuned as per data model - there are so many Tunables in yaml as well as table configs.

Idea is - in order to strengthen your claim, you need to provide complete system metrics (Disk, CPU, Network), the OPS increase starts to decay along with the configs used. Having plain ops per second and 99p latency is blackbox.

Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com>> wrote:
ScyllaDB engineer here.

C++ is really an enabling technology here. It is directly responsible for a small fraction of the gain by executing faster than Java.  But it is indirectly responsible for the gain by allowing us direct control over memory and threading.  Just as an example, Scylla starts by taking over almost all of the machine's memory, and dynamically assigning it to memtables, cache, and working memory needed to handle requests in flight.  Memory is statically partitioned across cores, allowing us to exploit NUMA fully.  You can't do these things in Java.

I would say the major contributors to Scylla performance are:
 - thread-per-core design
 - replacement of the page cache with a row cache
 - careful attention to many small details, each contributing a little, but with a large overall impact

While I'm here I can say that performance is not the only goal here, it is stable and predictable performance over varying loads and during maintenance operations like repair, without any special tuning.  We measure the amount of CPU and I/O spent on foreground (user) and background (maintenance) tasks and divide them fairly.  This work is not complete but already makes operating Scylla a lot simpler.

On 03/10/2017 01:42 AM, Kant Kodali wrote:
I dont think ScyllaDB performance is because of C++. The design decisions in scylladb are indeed different from Cassandra such as getting rid of SEDA and moving to TPC and so on.

If someone thinks it is because of C++ then just show the benchmarks that proves it is indeed the C++ which gave 10X performance boost as ScyllaDB claims instead of stating it.

On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>> wrote:
They spend an enormous amount of time focusing on performance. You can expect them to continue on with their optimization and keep crushing it.

P.S., I don't work for ScyllaDB.

On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>> wrote:
In all of their presentation they keep harping on the fact that scylladb is written in C++ and does not carry the overhead of Java.  Still the difference looks staggering.
______________________________ __________
From: daemeon reiydelle <da...@gmail.com>>
Sent: Thursday, March 9, 2017 14:21
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: scylladb

The comparison is fair, and conservative. Did substantial performance comparisons for two clients, both results returned throughputs that were faster than the published comparisons (15x as I recall). At that time the client preferred to utilize a Cass COTS solution and use a caching solution for OLA compliance.

.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <ro...@us2.nl><mailto:robin@us2 .nl<ma...@us2.nl>>> wrote:
I was wondering how people feel about the comparison that's made here between Cassandra and ScyllaDB : http://www.scylladb.com/techno logy/ycsb-cassandra-scylla/#re sults-of-3-scylla-nodes-vs-30- cassandra-nodes<http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>

They are claiming a 10x improvement, is that a fair comparison or maybe a somewhat coloured view of a (micro)benchmark in a specific setup? Any pros/cons known?

Best regards,

Robin Verlangen
Chief Data Architect

Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com><mailto:rolo@ pythian.com<ma...@pythian.com>>> wrote:
No rain at all! But I almost had it running last weekend, but stopped short of installing it. Let's see if this one is for real!

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter olo<http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car losjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>>
Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613 565 8696 x1649<tel:+1%20613-565-8696>
www.pythian.com<http://www.pythian.com/><http://www.pyt hian.com/<http://www.pythian.com/>>

On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <da...@datastax.com><m ailto:dani.traphagen@datastax. com<ma...@datastax.com>>> wrote:
You'll be the first Carlos.

[Inline image 1]

Had any rain lately? Curious how this went, if so.

On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <ja...@gmail.com><mail to:jack.krupansky@gmail.com<ma...@gmail.com>>> wrote:
I just did a Twitter search on scylladb and did not see any tweets about actual use, so far.

-- Jack Krupansky

On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com><mailto:inf o@mrcalonso.com<ma...@mrcalonso.com>>> wrote:
Any update about this?

@Carlos Rolo, did you tried it? Thoughts?

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/c alonso<https://twitter.com/calonso>>

On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com><mailto:rolo@ pythian.com<ma...@pythian.com>>> wrote:
Something to do on a expected rainy weekend. Thanks for the information.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter olo<http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car losjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>>
Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081% 2000> | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208 696%20x1649>
www.pythian.com<http://www.pythian.com/><http://www.pyt hian.com/<http://www.pythian.com/>>

On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <da...@datastax.com><m ailto:dani.traphagen@datastax. com<ma...@datastax.com>>> wrote:
As of two days ago, they say they've got it @cjrolo.

https://github.com/scylladb/sc ylla/wiki/RELEASE-Scylla-0.11- Beta<https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>

On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com><mailto:rolo@ pythian.com<ma...@pythian.com>>> wrote:
I will not try until multi-DC is implemented. More than an month has passed since I looked for it, so it could possibly be in place, if so I may take some time to test it.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarter olo<http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/car losjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>>
Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081% 2000> | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208 696%20x1649>
www.pythian.com<http://www.pythian.com/><http://www.pyt hian.com/<http://www.pythian.com/>>

On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>> wrote:
Nope, no one I know.  Let me know if you try it I'd love to hear your feedback.

> On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>> wrote:
>
> Hi guys,
>
> did anyone already try Scylladb (yet another fastest NoSQL database in town) and has some thoughts/hands-on experience to share?
>
> Cheers,
> Tommaso

--

--
Sent from mobile -- apologizes for brevity or errors.

--

--
[datastax_logo.png]<http://www .datastax.com/<http://www.datastax.com/>>

DANI TRAPHAGEN

Technical Enablement Lead | dani.traphagen@datastax.com<ma...@datastax.com><ma ilto:dani.traphagen@datastax.c om<ma...@datastax.com>>

[twitter.png]<https://twitter. com/dtrapezoid<https://twitter.com/dtrapezoid>> [linkedin.png] <https://www.linkedin.com/pub/ dani-traphagen/31/93b/b85<https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>  [https://lh5.googleusercontent .com/WcFJcWZHKXnxu01V6zJIQapcG onoazqsv8O7_DtfhW-qbTRHxDjfX2o wDNmQhgojRx5Y4mLEc-KiAeeTJjT0V mKiiIld8UP86AgQPJDK2o6oC6BhTmu b4NLZ_MO9-E7l9Q<https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>] <https://github.com/dtrapezoid >

[http://datastax.com/all/image s/cs_logo_color_sm.png<http://datastax.com/all/images/cs_logo_color_sm.png>]

--

--
-Richard L. Burton III
@rburton

Re: scylladb

Posted by Bhuvan Rawal <bh...@gmail.com>.

Agreed C++ gives an added advantage to talk to underlying hardware with
better efficiency, it sound good but can a pice of code written in C++ give
1000% throughput than a Java app? Is TPC design 10X more performant than
SEDA arch?

And if C/C++ is indeed that fast how can Aerospike (which is itself written
in C) claim to be 10X faster than Scylla here
http://www.aerospike.com/benchmarks/scylladb-initial/ ? (Combining your's
and aerospike's benchmarks it appears that Aerospike is 100X performant
than C* - I highly doubt that!! )

For a moment lets forget about evaluating 2 different databases, one can
observe 10X performance difference between a mistuned cassandra cluster and
one thats tuned as per data model - there are so many Tunables in yaml as
well as table configs.

Idea is - in order to strengthen your claim, you need to provide complete
system metrics (Disk, CPU, Network), the OPS increase starts to decay along
with the configs used. Having plain ops per second and 99p latency is
blackbox.

Regards,
Bhuvan

On Fri, Mar 10, 2017 at 12:47 PM, Avi Kivity <av...@scylladb.com> wrote:

> ScyllaDB engineer here.
>
> C++ is really an enabling technology here. It is directly responsible for
> a small fraction of the gain by executing faster than Java.  But it is
> indirectly responsible for the gain by allowing us direct control over
> memory and threading.  Just as an example, Scylla starts by taking over
> almost all of the machine's memory, and dynamically assigning it to
> memtables, cache, and working memory needed to handle requests in flight.
> Memory is statically partitioned across cores, allowing us to exploit NUMA
> fully.  You can't do these things in Java.
>
> I would say the major contributors to Scylla performance are:
>  - thread-per-core design
>  - replacement of the page cache with a row cache
>  - careful attention to many small details, each contributing a little,
> but with a large overall impact
>
> While I'm here I can say that performance is not the only goal here, it is
> stable and predictable performance over varying loads and during
> maintenance operations like repair, without any special tuning.  We measure
> the amount of CPU and I/O spent on foreground (user) and background
> (maintenance) tasks and divide them fairly.  This work is not complete but
> already makes operating Scylla a lot simpler.
>
>
> On 03/10/2017 01:42 AM, Kant Kodali wrote:
>
> I dont think ScyllaDB performance is because of C++. The design decisions
> in scylladb are indeed different from Cassandra such as getting rid of SEDA
> and moving to TPC and so on.
>
> If someone thinks it is because of C++ then just show the benchmarks that
> proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
> claims instead of stating it.
>
>
> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>
> wrote:
>
>> They spend an enormous amount of time focusing on performance. You can
>> expect them to continue on with their optimization and keep crushing it.
>>
>> P.S., I don't work for ScyllaDB.
>>
>> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
>> wrote:
>>
>>> In all of their presentation they keep harping on the fact that scylladb
>>> is written in C++ and does not carry the overhead of Java.  Still the
>>> difference looks staggering.
>>> ________________________________________
>>> From: daemeon reiydelle <da...@gmail.com>
>>> Sent: Thursday, March 9, 2017 14:21
>>> To: user@cassandra.apache.org
>>> Subject: Re: scylladb
>>>
>>> The comparison is fair, and conservative. Did substantial performance
>>> comparisons for two clients, both results returned throughputs that were
>>> faster than the published comparisons (15x as I recall). At that time the
>>> client preferred to utilize a Cass COTS solution and use a caching solution
>>> for OLA compliance.
>>>
>>>
>>> .......
>>>
>>> Daemeon C.M. Reiydelle
>>> USA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>
>>> London (+44) (0) 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>
>>>
>>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>>> robin@us2.nl>> wrote:
>>> I was wondering how people feel about the comparison that's made here
>>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-
>>> cassandra-nodes
>>>
>>> They are claiming a 10x improvement, is that a fair comparison or maybe
>>> a somewhat coloured view of a (micro)benchmark in a specific setup? Any
>>> pros/cons known?
>>>
>>> Best regards,
>>>
>>> Robin Verlangen
>>> Chief Data Architect
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
>>> rolo@pythian.com>> wrote:
>>> No rain at all! But I almost had it running last weekend, but stopped
>>> short of installing it. Let's see if this one is for real!
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>>> | Tel: +1 613 565 8696 x1649 <%2B1%20613%20565%208696%20x1649>
>>> <tel:+1%20613-565-8696>
>>> www.pythian.com<http://www.pythian.com/>
>>>
>>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>> You'll be the first Carlos.
>>>
>>> [Inline image 1]
>>>
>>> Had any rain lately? Curious how this went, if so.
>>>
>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>> jack.krupansky@gmail.com<ma...@gmail.com>> wrote:
>>> I just did a Twitter search on scylladb and did not see any tweets about
>>> actual use, so far.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>>> <ma...@mrcalonso.com>> wrote:
>>> Any update about this?
>>>
>>> @Carlos Rolo, did you tried it? Thoughts?
>>>
>>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso
>>> >
>>>
>>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>>> pythian.com>> wrote:
>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000>
>>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649
>>> <%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>> www.pythian.com<http://www.pythian.com/>
>>>
>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>>> As of two days ago, they say they've got it @cjrolo.
>>>
>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>
>>>
>>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:
>>> rolo@pythian.com>> wrote:
>>> I will not try until multi-DC is implemented. More than an month has
>>> passed since I looked for it, so it could possibly be in place, if so I may
>>> take some time to test it.
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>>> losjuzarterolo>
>>> Mobile: +351 91 891 81 00 <%2B351%2091%20891%2081%2000>
>>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649
>>> <%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>>> www.pythian.com<http://www.pythian.com/>
>>>
>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>>> wrote:
>>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>>> feedback.
>>>
>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>> wrote:
>>> >
>>> > Hi guys,
>>> >
>>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>>> town) and has some thoughts/hands-on experience to share?
>>> >
>>> > Cheers,
>>> > Tommaso
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>> --
>>> Sent from mobile -- apologizes for brevity or errors.
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> [datastax_logo.png]<http://www.datastax.com/>
>>>
>>> DANI TRAPHAGEN
>>>
>>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>>> dani.traphagen@datastax.com>
>>>
>>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>>> https://github.com/dtrapezoid>
>>>
>>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>>
>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>
>
>

Re: scylladb

Posted by Avi Kivity <av...@scylladb.com>.

ScyllaDB engineer here.

C++ is really an enabling technology here. It is directly responsible 
for a small fraction of the gain by executing faster than Java.  But it 
is indirectly responsible for the gain by allowing us direct control 
over memory and threading.  Just as an example, Scylla starts by taking 
over almost all of the machine's memory, and dynamically assigning it to 
memtables, cache, and working memory needed to handle requests in 
flight.  Memory is statically partitioned across cores, allowing us to 
exploit NUMA fully.  You can't do these things in Java.

I would say the major contributors to Scylla performance are:
  - thread-per-core design
  - replacement of the page cache with a row cache
  - careful attention to many small details, each contributing a little, 
but with a large overall impact

While I'm here I can say that performance is not the only goal here, it 
is stable and predictable performance over varying loads and during 
maintenance operations like repair, without any special tuning.  We 
measure the amount of CPU and I/O spent on foreground (user) and 
background (maintenance) tasks and divide them fairly. This work is not 
complete but already makes operating Scylla a lot simpler.

On 03/10/2017 01:42 AM, Kant Kodali wrote:
> I dont think ScyllaDB performance is because of C++. The design 
> decisions in scylladb are indeed different from Cassandra such as 
> getting rid of SEDA and moving to TPC and so on.
>
> If someone thinks it is because of C++ then just show the benchmarks 
> that proves it is indeed the C++ which gave 10X performance boost as 
> ScyllaDB claims instead of stating it.
>
>
> On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III 
> <mrburton@gmail.com <ma...@gmail.com>> wrote:
>
>     They spend an enormous amount of time focusing on performance. You
>     can expect them to continue on with their optimization and keep
>     crushing it.
>
>     P.S., I don't work for ScyllaDB.
>
>     On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar
>     <rakeshkumar464@outlook.com <ma...@outlook.com>>
>     wrote:
>
>         In all of their presentation they keep harping on the fact
>         that scylladb is written in C++ and does not carry the
>         overhead of Java.  Still the difference looks staggering.
>         ________________________________________
>         From: daemeon reiydelle <daemeonr@gmail.com
>         <ma...@gmail.com>>
>         Sent: Thursday, March 9, 2017 14:21
>         To: user@cassandra.apache.org <ma...@cassandra.apache.org>
>         Subject: Re: scylladb
>
>         The comparison is fair, and conservative. Did substantial
>         performance comparisons for two clients, both results returned
>         throughputs that were faster than the published comparisons
>         (15x as I recall). At that time the client preferred to
>         utilize a Cass COTS solution and use a caching solution for
>         OLA compliance.
>
>
>         .......
>
>         Daemeon C.M. Reiydelle
>         USA (+1) 415.501.0198 <tel:%28%2B1%29%20415.501.0198>
>         London (+44) (0) 20 8144 9872
>         <tel:%28%2B44%29%20%280%29%2020%208144%209872>
>
>         On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl
>         <ma...@us2.nl><mailto:robin@us2.nl
>         <ma...@us2.nl>>> wrote:
>         I was wondering how people feel about the comparison that's
>         made here between Cassandra and ScyllaDB :
>         http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes
>         <http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes>
>
>         They are claiming a 10x improvement, is that a fair comparison
>         or maybe a somewhat coloured view of a (micro)benchmark in a
>         specific setup? Any pros/cons known?
>
>         Best regards,
>
>         Robin Verlangen
>         Chief Data Architect
>
>         Disclaimer: The information contained in this message and
>         attachments is intended solely for the attention and use of
>         the named addressee and may be confidential. If you are not
>         the intended recipient, you are reminded that the information
>         remains the property of the sender. You must not use,
>         disclose, distribute, copy, print or rely on this e-mail. If
>         you have received this message in error, please contact the
>         sender immediately and irrevocably delete this message and any
>         copies.
>
>         On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo
>         <rolo@pythian.com
>         <ma...@pythian.com><mailto:rolo@pythian.com
>         <ma...@pythian.com>>> wrote:
>         No rain at all! But I almost had it running last weekend, but
>         stopped short of installing it. Let's see if this one is for real!
>
>         Regards,
>
>         Carlos Juzarte Rolo
>         Cassandra Consultant
>
>         Pythian - Love your data
>
>         rolo@pythian | Twitter: @cjrolo | Linkedin:
>         linkedin.com/in/carlosjuzarterolo
>         <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>         <http://linkedin.com/in/carlosjuzarterolo>>
>         Mobile: +351 91 891 81 00
>         <tel:%2B351%2091%20891%2081%2000><tel:+351%20918%20918%20100>
>         | Tel: +1 613 565 8696 x1649
>         <tel:%2B1%20613%20565%208696%20x1649><tel:+1%20613-565-8696>
>         www.pythian.com
>         <http://www.pythian.com><http://www.pythian.com/
>         <http://www.pythian.com/>>
>
>         On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen
>         <dani.traphagen@datastax.com
>         <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>         <ma...@datastax.com>>> wrote:
>         You'll be the first Carlos.
>
>         [Inline image 1]
>
>         Had any rain lately? Curious how this went, if so.
>
>         On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky
>         <jack.krupansky@gmail.com
>         <ma...@gmail.com><mailto:jack.krupansky@gmail.com
>         <ma...@gmail.com>>> wrote:
>         I just did a Twitter search on scylladb and did not see any
>         tweets about actual use, so far.
>
>
>         -- Jack Krupansky
>
>         On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso
>         <info@mrcalonso.com
>         <ma...@mrcalonso.com><mailto:info@mrcalonso.com
>         <ma...@mrcalonso.com>>> wrote:
>         Any update about this?
>
>         @Carlos Rolo, did you tried it? Thoughts?
>
>         Carlos Alonso | Software Engineer |
>         @calonso<https://twitter.com/calonso
>         <https://twitter.com/calonso>>
>
>         On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com
>         <ma...@pythian.com><mailto:rolo@pythian.com
>         <ma...@pythian.com>>> wrote:
>         Something to do on a expected rainy weekend. Thanks for the
>         information.
>
>         Regards,
>
>         Carlos Juzarte Rolo
>         Cassandra Consultant
>
>         Pythian - Love your data
>
>         rolo@pythian | Twitter: @cjrolo | Linkedin:
>         linkedin.com/in/carlosjuzarterolo
>         <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>         <http://linkedin.com/in/carlosjuzarterolo>>
>         Mobile: +351 91 891 81 00
>         <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>         | Tel: +1 613 565 8696 x1649
>         <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>         www.pythian.com
>         <http://www.pythian.com><http://www.pythian.com/
>         <http://www.pythian.com/>>
>
>         On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen
>         <dani.traphagen@datastax.com
>         <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>         <ma...@datastax.com>>> wrote:
>         As of two days ago, they say they've got it @cjrolo.
>
>         https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>         <https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta>
>
>
>         On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com
>         <ma...@pythian.com><mailto:rolo@pythian.com
>         <ma...@pythian.com>>> wrote:
>         I will not try until multi-DC is implemented. More than an
>         month has passed since I looked for it, so it could possibly
>         be in place, if so I may take some time to test it.
>
>         Regards,
>
>         Carlos Juzarte Rolo
>         Cassandra Consultant
>
>         Pythian - Love your data
>
>         rolo@pythian | Twitter: @cjrolo | Linkedin:
>         linkedin.com/in/carlosjuzarterolo
>         <http://linkedin.com/in/carlosjuzarterolo><http://linkedin.com/in/carlosjuzarterolo
>         <http://linkedin.com/in/carlosjuzarterolo>>
>         Mobile: +351 91 891 81 00
>         <tel:%2B351%2091%20891%2081%2000><tel:%2B351%2091%20891%2081%2000>
>         | Tel: +1 613 565 8696 x1649
>         <tel:%2B1%20613%20565%208696%20x1649><tel:%2B1%20613%20565%208696%20x1649>
>         www.pythian.com
>         <http://www.pythian.com><http://www.pythian.com/
>         <http://www.pythian.com/>>
>
>         On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad
>         <jonathan.haddad@gmail.com <ma...@gmail.com>>
>         wrote:
>         Nope, no one I know.  Let me know if you try it I'd love to
>         hear your feedback.
>
>         > On Nov 5, 2015, at 9:22 AM, tommaso barbugli
>         <tbarbugli@gmail.com <ma...@gmail.com>> wrote:
>         >
>         > Hi guys,
>         >
>         > did anyone already try Scylladb (yet another fastest NoSQL
>         database in town) and has some thoughts/hands-on experience to
>         share?
>         >
>         > Cheers,
>         > Tommaso
>
>
>
>
>         --
>
>
>
>
>         --
>         Sent from mobile -- apologizes for brevity or errors.
>
>
>
>         --
>
>
>
>
>
>
>
>         --
>         [datastax_logo.png]<http://www.datastax.com/
>         <http://www.datastax.com/>>
>
>         DANI TRAPHAGEN
>
>         Technical Enablement Lead | dani.traphagen@datastax.com
>         <ma...@datastax.com><mailto:dani.traphagen@datastax.com
>         <ma...@datastax.com>>
>
>         [twitter.png]<https://twitter.com/dtrapezoid
>         <https://twitter.com/dtrapezoid>> [linkedin.png]
>         <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85
>         <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>>
>         [https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q
>         <https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q>]
>         <https://github.com/dtrapezoid>
>
>         [http://datastax.com/all/images/cs_logo_color_sm.png
>         <http://datastax.com/all/images/cs_logo_color_sm.png>]
>
>
>
>         --
>
>
>
>
>
>
>
>     -- 
>     -Richard L. Burton III
>     @rburton
>
>

Re: scylladb

Posted by Kant Kodali <ka...@peernova.com>.

I dont think ScyllaDB performance is because of C++. The design decisions
in scylladb are indeed different from Cassandra such as getting rid of SEDA
and moving to TPC and so on.

If someone thinks it is because of C++ then just show the benchmarks that
proves it is indeed the C++ which gave 10X performance boost as ScyllaDB
claims instead of stating it.


On Thu, Mar 9, 2017 at 3:22 PM, Richard L. Burton III <mr...@gmail.com>
wrote:

> They spend an enormous amount of time focusing on performance. You can
> expect them to continue on with their optimization and keep crushing it.
>
> P.S., I don't work for ScyllaDB.
>
> On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
> wrote:
>
>> In all of their presentation they keep harping on the fact that scylladb
>> is written in C++ and does not carry the overhead of Java.  Still the
>> difference looks staggering.
>> ________________________________________
>> From: daemeon reiydelle <da...@gmail.com>
>> Sent: Thursday, March 9, 2017 14:21
>> To: user@cassandra.apache.org
>> Subject: Re: scylladb
>>
>> The comparison is fair, and conservative. Did substantial performance
>> comparisons for two clients, both results returned throughputs that were
>> faster than the published comparisons (15x as I recall). At that time the
>> client preferred to utilize a Cass COTS solution and use a caching solution
>> for OLA compliance.
>>
>>
>> .......
>>
>> Daemeon C.M. Reiydelle
>> USA (+1) 415.501.0198
>> London (+44) (0) 20 8144 9872
>>
>> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
>> robin@us2.nl>> wrote:
>> I was wondering how people feel about the comparison that's made here
>> between Cassandra and ScyllaDB : http://www.scylladb.com/techno
>> logy/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-
>> 30-cassandra-nodes
>>
>> They are claiming a 10x improvement, is that a fair comparison or maybe a
>> somewhat coloured view of a (micro)benchmark in a specific setup? Any
>> pros/cons known?
>>
>> Best regards,
>>
>> Robin Verlangen
>> Chief Data Architect
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
>> rolo@pythian.com>> wrote:
>> No rain at all! But I almost had it running last weekend, but stopped
>> short of installing it. Let's see if this one is for real!
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>> losjuzarterolo>
>> Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613 565
>> 8696 x1649<tel:+1%20613-565-8696>
>> www.pythian.com<http://www.pythian.com/>
>>
>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>> You'll be the first Carlos.
>>
>> [Inline image 1]
>>
>> Had any rain lately? Curious how this went, if so.
>>
>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <jack.krupansky@gmail.com
>> <ma...@gmail.com>> wrote:
>> I just did a Twitter search on scylladb and did not see any tweets about
>> actual use, so far.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
>> <ma...@mrcalonso.com>> wrote:
>> Any update about this?
>>
>> @Carlos Rolo, did you tried it? Thoughts?
>>
>> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>
>>
>> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>> pythian.com>> wrote:
>> Something to do on a expected rainy weekend. Thanks for the information.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>> losjuzarterolo>
>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1 613
>> 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
>> www.pythian.com<http://www.pythian.com/>
>>
>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
>> As of two days ago, they say they've got it @cjrolo.
>>
>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>
>>
>> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:rolo@
>> pythian.com>> wrote:
>> I will not try until multi-DC is implemented. More than an month has
>> passed since I looked for it, so it could possibly be in place, if so I may
>> take some time to test it.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/car
>> losjuzarterolo>
>> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1 613
>> 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
>> www.pythian.com<http://www.pythian.com/>
>>
>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
>> wrote:
>> Nope, no one I know.  Let me know if you try it I'd love to hear your
>> feedback.
>>
>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>> wrote:
>> >
>> > Hi guys,
>> >
>> > did anyone already try Scylladb (yet another fastest NoSQL database in
>> town) and has some thoughts/hands-on experience to share?
>> >
>> > Cheers,
>> > Tommaso
>>
>>
>>
>>
>> --
>>
>>
>>
>>
>> --
>> Sent from mobile -- apologizes for brevity or errors.
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>> --
>> [datastax_logo.png]<http://www.datastax.com/>
>>
>> DANI TRAPHAGEN
>>
>> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
>> dani.traphagen@datastax.com>
>>
>> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
>> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [
>> https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcG
>> onoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0V
>> mKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <
>> https://github.com/dtrapezoid>
>>
>> [http://datastax.com/all/images/cs_logo_color_sm.png]
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>

Re: scylladb

Posted by "Richard L. Burton III" <mr...@gmail.com>.

They spend an enormous amount of time focusing on performance. You can
expect them to continue on with their optimization and keep crushing it.

P.S., I don't work for ScyllaDB.

On Thu, Mar 9, 2017 at 6:02 PM, Rakesh Kumar <ra...@outlook.com>
wrote:

> In all of their presentation they keep harping on the fact that scylladb
> is written in C++ and does not carry the overhead of Java.  Still the
> difference looks staggering.
> ________________________________________
> From: daemeon reiydelle <da...@gmail.com>
> Sent: Thursday, March 9, 2017 14:21
> To: user@cassandra.apache.org
> Subject: Re: scylladb
>
> The comparison is fair, and conservative. Did substantial performance
> comparisons for two clients, both results returned throughputs that were
> faster than the published comparisons (15x as I recall). At that time the
> client preferred to utilize a Cass COTS solution and use a caching solution
> for OLA compliance.
>
>
> .......
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
>
> On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <robin@us2.nl<mailto:
> robin@us2.nl>> wrote:
> I was wondering how people feel about the comparison that's made here
> between Cassandra and ScyllaDB : http://www.scylladb.com/
> technology/ycsb-cassandra-scylla/#results-of-3-scylla-
> nodes-vs-30-cassandra-nodes
>
> They are claiming a 10x improvement, is that a fair comparison or maybe a
> somewhat coloured view of a (micro)benchmark in a specific setup? Any
> pros/cons known?
>
> Best regards,
>
> Robin Verlangen
> Chief Data Architect
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <rolo@pythian.com<mailto:
> rolo@pythian.com>> wrote:
> No rain at all! But I almost had it running last weekend, but stopped
> short of installing it. Let's see if this one is for real!
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/
> carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
> Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613 565
> 8696 x1649<tel:+1%20613-565-8696>
> www.pythian.com<http://www.pythian.com/>
>
> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
> You'll be the first Carlos.
>
> [Inline image 1]
>
> Had any rain lately? Curious how this went, if so.
>
> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <jack.krupansky@gmail.com<
> mailto:jack.krupansky@gmail.com>> wrote:
> I just did a Twitter search on scylladb and did not see any tweets about
> actual use, so far.
>
>
> -- Jack Krupansky
>
> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <info@mrcalonso.com
> <ma...@mrcalonso.com>> wrote:
> Any update about this?
>
> @Carlos Rolo, did you tried it? Thoughts?
>
> Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>
>
> On 5 November 2015 at 14:07, Carlos Rolo <rolo@pythian.com<mailto:rolo@
> pythian.com>> wrote:
> Something to do on a expected rainy weekend. Thanks for the information.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/
> carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1 613
> 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
> www.pythian.com<http://www.pythian.com/>
>
> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
> dani.traphagen@datastax.com<ma...@datastax.com>> wrote:
> As of two days ago, they say they've got it @cjrolo.
>
> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>
>
> On Thursday, November 5, 2015, Carlos Rolo <rolo@pythian.com<mailto:rolo@
> pythian.com>> wrote:
> I will not try until multi-DC is implemented. More than an month has
> passed since I looked for it, so it could possibly be in place, if so I may
> take some time to test it.
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/
> carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
> Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1 613
> 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
> www.pythian.com<http://www.pythian.com/>
>
> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com>
> wrote:
> Nope, no one I know.  Let me know if you try it I'd love to hear your
> feedback.
>
> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
> wrote:
> >
> > Hi guys,
> >
> > did anyone already try Scylladb (yet another fastest NoSQL database in
> town) and has some thoughts/hands-on experience to share?
> >
> > Cheers,
> > Tommaso
>
>
>
>
> --
>
>
>
>
> --
> Sent from mobile -- apologizes for brevity or errors.
>
>
>
> --
>
>
>
>
>
>
>
> --
> [datastax_logo.png]<http://www.datastax.com/>
>
> DANI TRAPHAGEN
>
> Technical Enablement Lead | dani.traphagen@datastax.com<mailto:
> dani.traphagen@datastax.com>
>
> [twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <
> https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [https://lh5.
> googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-
> qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK
> 2o6oC6BhTmub4NLZ_MO9-E7l9Q] <https://github.com/dtrapezoid>
>
> [http://datastax.com/all/images/cs_logo_color_sm.png]
>
>
>
> --
>
>
>
>
>


-- 
-Richard L. Burton III
@rburton

Re: scylladb

Posted by Rakesh Kumar <ra...@outlook.com>.

In all of their presentation they keep harping on the fact that scylladb is written in C++ and does not carry the overhead of Java.  Still the difference looks staggering.
________________________________________
From: daemeon reiydelle <da...@gmail.com>
Sent: Thursday, March 9, 2017 14:21
To: user@cassandra.apache.org
Subject: Re: scylladb

The comparison is fair, and conservative. Did substantial performance comparisons for two clients, both results returned throughputs that were faster than the published comparisons (15x as I recall). At that time the client preferred to utilize a Cass COTS solution and use a caching solution for OLA compliance.

.......

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <ro...@us2.nl>> wrote:
I was wondering how people feel about the comparison that's made here between Cassandra and ScyllaDB : http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes

They are claiming a 10x improvement, is that a fair comparison or maybe a somewhat coloured view of a (micro)benchmark in a specific setup? Any pros/cons known?

Best regards,

Robin Verlangen
Chief Data Architect

Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com>> wrote:
No rain at all! But I almost had it running last weekend, but stopped short of installing it. Let's see if this one is for real!

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +351 91 891 81 00<tel:+351%20918%20918%20100> | Tel: +1 613 565 8696 x1649<tel:+1%20613-565-8696>
www.pythian.com<http://www.pythian.com/>

On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <da...@datastax.com>> wrote:
You'll be the first Carlos.

[Inline image 1]

Had any rain lately? Curious how this went, if so.

On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <ja...@gmail.com>> wrote:
I just did a Twitter search on scylladb and did not see any tweets about actual use, so far.

-- Jack Krupansky

On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>> wrote:
Any update about this?

@Carlos Rolo, did you tried it? Thoughts?

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com>> wrote:
Something to do on a expected rainy weekend. Thanks for the information.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
www.pythian.com<http://www.pythian.com/>

On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <da...@datastax.com>> wrote:
As of two days ago, they say they've got it @cjrolo.

https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta

On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com>> wrote:
I will not try until multi-DC is implemented. More than an month has passed since I looked for it, so it could possibly be in place, if so I may take some time to test it.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +351 91 891 81 00<tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696 x1649<tel:%2B1%20613%20565%208696%20x1649>
www.pythian.com<http://www.pythian.com/>

On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jo...@gmail.com> wrote:
Nope, no one I know.  Let me know if you try it I'd love to hear your feedback.

> On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com> wrote:
>
> Hi guys,
>
> did anyone already try Scylladb (yet another fastest NoSQL database in town) and has some thoughts/hands-on experience to share?
>
> Cheers,
> Tommaso

--

--
Sent from mobile -- apologizes for brevity or errors.

--

--
[datastax_logo.png]<http://www.datastax.com/>

DANI TRAPHAGEN

Technical Enablement Lead | dani.traphagen@datastax.com<ma...@datastax.com>

[twitter.png]<https://twitter.com/dtrapezoid> [linkedin.png] <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>  [https://lh5.googleusercontent.com/WcFJcWZHKXnxu01V6zJIQapcGonoazqsv8O7_DtfhW-qbTRHxDjfX2owDNmQhgojRx5Y4mLEc-KiAeeTJjT0VmKiiIld8UP86AgQPJDK2o6oC6BhTmub4NLZ_MO9-E7l9Q] <https://github.com/dtrapezoid>

[http://datastax.com/all/images/cs_logo_color_sm.png]

--

Re: scylladb

Posted by daemeon reiydelle <da...@gmail.com>.

The comparison is fair, and conservative. Did substantial performance
comparisons for two clients, both results returned throughputs that were
faster than the published comparisons (15x as I recall). At that time the
client preferred to utilize a Cass COTS solution and use a caching solution
for OLA compliance.


*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Mar 9, 2017 at 11:04 AM, Robin Verlangen <ro...@us2.nl> wrote:

> I was wondering how people feel about the comparison that's made here
> between Cassandra and ScyllaDB : http://www.scylladb.com/
> technology/ycsb-cassandra-scylla/#results-of-3-scylla-
> nodes-vs-30-cassandra-nodes
>
> They are claiming a 10x improvement, is that a fair comparison or maybe a
> somewhat coloured view of a (micro)benchmark in a specific setup? Any
> pros/cons known?
>
> Best regards,
>
> Robin Verlangen
> *Chief Data Architect*
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
> On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com> wrote:
>
>> No rain at all! But I almost had it running last weekend, but stopped
>> short of installing it. Let's see if this one is for real!
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +351 91 891 81 00 <+351%20918%20918%20100> | Tel: +1 613 565
>> 8696 x1649 <+1%20613-565-8696>
>> www.pythian.com
>>
>> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
>> dani.traphagen@datastax.com> wrote:
>>
>>> You'll be the first Carlos.
>>>
>>> [image: Inline image 1]
>>>
>>> Had any rain lately? Curious how this went, if so.
>>>
>>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <
>>> jack.krupansky@gmail.com> wrote:
>>>
>>>> I just did a Twitter search on scylladb and did not see any tweets
>>>> about actual use, so far.
>>>>
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>
>>>> wrote:
>>>>
>>>>> Any update about this?
>>>>>
>>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>>
>>>>> Carlos Alonso | Software Engineer | @calonso
>>>>> <https://twitter.com/calonso>
>>>>>
>>>>> On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com> wrote:
>>>>>
>>>>>> Something to do on a expected rainy weekend. Thanks for the
>>>>>> information.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Carlos Juzarte Rolo
>>>>>> Cassandra Consultant
>>>>>>
>>>>>> Pythian - Love your data
>>>>>>
>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>> www.pythian.com
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>>>> dani.traphagen@datastax.com> wrote:
>>>>>>
>>>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>>>
>>>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>>>
>>>>>>>
>>>>>>> On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com> wrote:
>>>>>>>
>>>>>>>> I will not try until multi-DC is implemented. More than an month
>>>>>>>> has passed since I looked for it, so it could possibly be in place, if so I
>>>>>>>> may take some time to test it.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Carlos Juzarte Rolo
>>>>>>>> Cassandra Consultant
>>>>>>>>
>>>>>>>> Pythian - Love your data
>>>>>>>>
>>>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>>>> www.pythian.com
>>>>>>>>
>>>>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <
>>>>>>>> jonathan.haddad@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear
>>>>>>>>> your feedback.
>>>>>>>>>
>>>>>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <
>>>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi guys,
>>>>>>>>> >
>>>>>>>>> > did anyone already try Scylladb (yet another fastest NoSQL
>>>>>>>>> database in town) and has some thoughts/hands-on experience to share?
>>>>>>>>> >
>>>>>>>>> > Cheers,
>>>>>>>>> > Tommaso
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>
>>> DANI TRAPHAGEN
>>>
>>> Technical Enablement Lead | dani.traphagen@datastax.com
>>>
>>> [image: twitter.png] <https://twitter.com/dtrapezoid> [image:
>>> linkedin.png] <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>
>>> <https://github.com/dtrapezoid>
>>>
>>>
>>>
>>
>> --
>>
>>
>>
>>
>

Re: scylladb

Posted by Robin Verlangen <ro...@us2.nl>.

I was wondering how people feel about the comparison that's made here
between Cassandra and ScyllaDB :
http://www.scylladb.com/technology/ycsb-cassandra-scylla/#results-of-3-scylla-nodes-vs-30-cassandra-nodes


They are claiming a 10x improvement, is that a fair comparison or maybe a
somewhat coloured view of a (micro)benchmark in a specific setup? Any
pros/cons known?

Best regards,

Robin Verlangen
*Chief Data Architect*

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Wed, Dec 16, 2015 at 11:52 AM, Carlos Rolo <ro...@pythian.com> wrote:

> No rain at all! But I almost had it running last weekend, but stopped
> short of installing it. Let's see if this one is for real!
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +351 91 891 81 00 <+351%20918%20918%20100> | Tel: +1 613 565 8696
> x1649 <+1%20613-565-8696>
> www.pythian.com
>
> On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
> dani.traphagen@datastax.com> wrote:
>
>> You'll be the first Carlos.
>>
>> [image: Inline image 1]
>>
>> Had any rain lately? Curious how this went, if so.
>>
>> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <jack.krupansky@gmail.com
>> > wrote:
>>
>>> I just did a Twitter search on scylladb and did not see any tweets about
>>> actual use, so far.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>
>>> wrote:
>>>
>>>> Any update about this?
>>>>
>>>> @Carlos Rolo, did you tried it? Thoughts?
>>>>
>>>> Carlos Alonso | Software Engineer | @calonso
>>>> <https://twitter.com/calonso>
>>>>
>>>> On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com> wrote:
>>>>
>>>>> Something to do on a expected rainy weekend. Thanks for the
>>>>> information.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>> www.pythian.com
>>>>>
>>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>>> dani.traphagen@datastax.com> wrote:
>>>>>
>>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>>
>>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>>
>>>>>>
>>>>>> On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com> wrote:
>>>>>>
>>>>>>> I will not try until multi-DC is implemented. More than an month has
>>>>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>>>>> take some time to test it.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Carlos Juzarte Rolo
>>>>>>> Cassandra Consultant
>>>>>>>
>>>>>>> Pythian - Love your data
>>>>>>>
>>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>>> www.pythian.com
>>>>>>>
>>>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <
>>>>>>> jonathan.haddad@gmail.com> wrote:
>>>>>>>
>>>>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear
>>>>>>>> your feedback.
>>>>>>>>
>>>>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hi guys,
>>>>>>>> >
>>>>>>>> > did anyone already try Scylladb (yet another fastest NoSQL
>>>>>>>> database in town) and has some thoughts/hands-on experience to share?
>>>>>>>> >
>>>>>>>> > Cheers,
>>>>>>>> > Tommaso
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> [image: datastax_logo.png] <http://www.datastax.com/>
>>
>> DANI TRAPHAGEN
>>
>> Technical Enablement Lead | dani.traphagen@datastax.com
>>
>> [image: twitter.png] <https://twitter.com/dtrapezoid> [image:
>> linkedin.png] <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>
>> <https://github.com/dtrapezoid>
>>
>>
>>
>
> --
>
>
>
>

Re: scylladb

Posted by Carlos Rolo <ro...@pythian.com>.

No rain at all! But I almost had it running last weekend, but stopped short
of installing it. Let's see if this one is for real!

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
<http://linkedin.com/in/carlosjuzarterolo>*
Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Wed, Dec 16, 2015 at 12:38 AM, Dani Traphagen <
dani.traphagen@datastax.com> wrote:

> You'll be the first Carlos.
>
> [image: Inline image 1]
>
> Had any rain lately? Curious how this went, if so.
>
> On Thu, Nov 12, 2015 at 4:36 AM, Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> I just did a Twitter search on scylladb and did not see any tweets about
>> actual use, so far.
>>
>>
>> -- Jack Krupansky
>>
>> On Wed, Nov 11, 2015 at 10:54 AM, Carlos Alonso <in...@mrcalonso.com>
>> wrote:
>>
>>> Any update about this?
>>>
>>> @Carlos Rolo, did you tried it? Thoughts?
>>>
>>> Carlos Alonso | Software Engineer | @calonso
>>> <https://twitter.com/calonso>
>>>
>>> On 5 November 2015 at 14:07, Carlos Rolo <ro...@pythian.com> wrote:
>>>
>>>> Something to do on a expected rainy weekend. Thanks for the information.
>>>>
>>>> Regards,
>>>>
>>>> Carlos Juzarte Rolo
>>>> Cassandra Consultant
>>>>
>>>> Pythian - Love your data
>>>>
>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>> www.pythian.com
>>>>
>>>> On Thu, Nov 5, 2015 at 12:07 PM, Dani Traphagen <
>>>> dani.traphagen@datastax.com> wrote:
>>>>
>>>>> As of two days ago, they say they've got it @cjrolo.
>>>>>
>>>>> https://github.com/scylladb/scylla/wiki/RELEASE-Scylla-0.11-Beta
>>>>>
>>>>>
>>>>> On Thursday, November 5, 2015, Carlos Rolo <ro...@pythian.com> wrote:
>>>>>
>>>>>> I will not try until multi-DC is implemented. More than an month has
>>>>>> passed since I looked for it, so it could possibly be in place, if so I may
>>>>>> take some time to test it.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Carlos Juzarte Rolo
>>>>>> Cassandra Consultant
>>>>>>
>>>>>> Pythian - Love your data
>>>>>>
>>>>>> rolo@pythian | Twitter: @cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>>>>>> www.pythian.com
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 9:37 AM, Jon Haddad <jonathan.haddad@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Nope, no one I know.  Let me know if you try it I'd love to hear
>>>>>>> your feedback.
>>>>>>>
>>>>>>> > On Nov 5, 2015, at 9:22 AM, tommaso barbugli <tb...@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi guys,
>>>>>>> >
>>>>>>> > did anyone already try Scylladb (yet another fastest NoSQL
>>>>>>> database in town) and has some thoughts/hands-on experience to share?
>>>>>>> >
>>>>>>> > Cheers,
>>>>>>> > Tommaso
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Sent from mobile -- apologizes for brevity or errors.
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> DANI TRAPHAGEN
>
> Technical Enablement Lead | dani.traphagen@datastax.com
>
> [image: twitter.png] <https://twitter.com/dtrapezoid> [image:
> linkedin.png] <https://www.linkedin.com/pub/dani-traphagen/31/93b/b85>
> <https://github.com/dtrapezoid>
>
>
>

-- 


--