You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Mike Coolin <mc...@techie.com> on 2012/03/23 01:54:02 UTC

Gathering ideas for measuring and tuning performance

Hi all,

 I have been looking at couchdb for a couple of months now and really like what i see, in fact I've started developing my project using couchapp. However I do find some areas where I would like to better understand how well or badly couch performs.

 So I am proposing starting a project that will implement some testing tools in erlang to test several aspects of couch, in my case I am looking for numbers for both the web server and the database.

 Now before I get long winded with details I'm think of, let me state up front that I would like to gather your ideas on the types of tests that should be done.

 I firmly believe that many applications will have differing usage profiles, what I want to try to capture is a number of common profiles, now you may ask what do I mean by a profile?

 Profiles in this case refer to how the database is to be used, are we reading alot, writng alot, logging, highly concurrent, are we dealing with 5 concurrent session to 2000? Should we do test over 1 hour, a day or a week?

 Ideally this tool would provide a means of testing the various operating system settings/erLang settings assisting production admins in configuring thier environments to perform optimally for them. It could also assist developers in testing code and tuning it for optimal performance and safety.

 I see a number of areas that can be tested including: 
    *  Web server performance 

    *  File serving
    *  Put/get/delete/post
    *  load testing
    *  Database performance 

    *  creating tables
    *  populating tables
    *  view performance - javascript - erlang
    *  filter performance
    *  insert with system keys and ideal keys
    *  inserts with bulk inserts
    *  database size
    *  varying record sizes
    *  concurrent user load testing
    *  attachments processing
    *  replication
    *  database compaction
    *  bulk inset/view response
 Ideally I'd like to record the erlang/system setup and whether the database and test code were running on the same system or separate systems and result sets. In this way others would be able to evaluate couchdb with more realistic expectations and couchdb will have something solid to point to in terms of its performance.

 I have not yet created the project on git, but that is my intension, but as I learned last night its nice to toss an idea out there, it's likely someone has some interesting things I could start with.

 Please add any ideas/issues/objections to this email. If you like the idea and think it would be useful let me know, if not, let me know as well.

 So far modeling something after nodeload was suggested, I briefly looked at it and yes it looks nice. But I'd like to take advantage of erlangs ability to run multiple processes to really load test the server.

 Looking forward to your comments.

 Thanks Mike

Re: Gathering ideas for measuring and tuning performance

Posted by Adam Lofts <ad...@gmail.com>.

>  Now before I get long winded with details I'm think of, let me state up front 
that I would like to gather your
> ideas on the types of tests that should be done.
> 

Hi,

I'm new to this list but have been thinking about benchmarking a little so hope
I can contribute to the discussion.

Couchdb already includes a small number of simple benchmarks in test/bench.
I have submitted a pull request [1] to fix the harness so that running 
./test/bench/run works. Adding some benchmarks for e.g. view build performance
would be simple and would facilitate easy benchmarking per git commit.

I also have an unpolished patch which will submit the results to a public
couch. A simple web app could then roll the data into a nice chart to better 
understand performance changes by commit / by release / by os / by platform.

>  Profiles in this case refer to how the database is to be used, are we reading 
alot, writng alot, logging,
> highly concurrent, are we dealing with 5 concurrent session to 2000? Should we 
do test over 1 hour, a day or a week?

Along the lines of profiles my second idea is inspired from how the cairo 
graphics project does benchmarking. Cairo will record the drawing operations
from real applications like firefox (e.g. fill circle, draw gradient, etc...)
into a log and then the developer can play back the operations as a benchmark.

Since couchdb is HTTP we could record the requests done by an app (e.g. from 
running an applications test suite) into a log and build up a collection of
real world load simulations.

Hope some of that is interesting.

Adam

[1] https://issues.apache.org/jira/browse/COUCHDB-1432

ps - I posted this with the gmane interface so hope it formats ok.

Re: Gathering ideas for measuring and tuning performance

Posted by "Eli Stevens (Gmail)" <wi...@gmail.com>.

I'd like to offer what little I've done to measure attachment speeds:

https://github.com/wickedgrey/couchdb-attachment-speed

It's in python, but might be useful as an example of what to test.
I've explained in more detail in this user@ mailing list thread:

http://mail-archives.apache.org/mod_mbox/couchdb-user/201112.mbox/%3CCADa34LDmNe5i8gx=mERAwz6iHdBaMD2Z2wQBarURkaisre4d1A@mail.gmail.com%3E

I'd been planning on trying to get some more attention on this issue
once 1.2.0 had been formally released, and I could do speed testing
with it.  However, it seems like it's very on topic for what you're
trying to do.

Let me know if I can help,
Eli

Re: Gathering ideas for measuring and tuning performance

Posted by Paul Davis <pa...@gmail.com>.

Looks like a good list. I'd also suggest sticking to Erlang as well.
You might want to check out basho_bench for one approach to load
testing with Erlang. Even if you don't use it directly its a good
example of how you might structure such a thing.

On Thu, Mar 22, 2012 at 7:58 PM, Jason Smith <jh...@iriscouch.com> wrote:
> Hi, Mike. This is excellent!
>
> I hate to multiply the effort, but almost every bullet-point you list
> could also be tested concurrently. In other words, concurrent
> performance is not a line-item to test, it is a whole new category of
> testing: view building with concurrent load, concurrent filtered
> replication, concurrent updates, with erlang or JS validation
> functions, etc.
>
> That is clearly harder to test, and it's fine to make it out of scope
> for now; however, **those** results are really the most useful when
> deciding whether to bet the farm.
>
> On Fri, Mar 23, 2012 at 7:54 AM, Mike Coolin <mc...@techie.com> wrote:
>> Hi all,
>>
>>  I have been looking at couchdb for a couple of months now and really like what i see, in fact I've started developing my project using couchapp. However I do find some areas where I would like to better understand how well or badly couch performs.
>>
>>  So I am proposing starting a project that will implement some testing tools in erlang to test several aspects of couch, in my case I am looking for numbers for both the web server and the database.
>>
>>  Now before I get long winded with details I'm think of, let me state up front that I would like to gather your ideas on the types of tests that should be done.
>>
>>  I firmly believe that many applications will have differing usage profiles, what I want to try to capture is a number of common profiles, now you may ask what do I mean by a profile?
>>
>>  Profiles in this case refer to how the database is to be used, are we reading alot, writng alot, logging, highly concurrent, are we dealing with 5 concurrent session to 2000? Should we do test over 1 hour, a day or a week?
>>
>>  Ideally this tool would provide a means of testing the various operating system settings/erLang settings assisting production admins in configuring thier environments to perform optimally for them. It could also assist developers in testing code and tuning it for optimal performance and safety.
>>
>>  I see a number of areas that can be tested including:
>>    *  Web server performance
>>
>>    *  File serving
>>    *  Put/get/delete/post
>>    *  load testing
>>    *  Database performance
>>
>>    *  creating tables
>>    *  populating tables
>>    *  view performance - javascript - erlang
>>    *  filter performance
>>    *  insert with system keys and ideal keys
>>    *  inserts with bulk inserts
>>    *  database size
>>    *  varying record sizes
>>    *  concurrent user load testing
>>    *  attachments processing
>>    *  replication
>>    *  database compaction
>>    *  bulk inset/view response
>>  Ideally I'd like to record the erlang/system setup and whether the database and test code were running on the same system or separate systems and result sets. In this way others would be able to evaluate couchdb with more realistic expectations and couchdb will have something solid to point to in terms of its performance.
>>
>>  I have not yet created the project on git, but that is my intension, but as I learned last night its nice to toss an idea out there, it's likely someone has some interesting things I could start with.
>>
>>  Please add any ideas/issues/objections to this email. If you like the idea and think it would be useful let me know, if not, let me know as well.
>>
>>  So far modeling something after nodeload was suggested, I briefly looked at it and yes it looks nice. But I'd like to take advantage of erlangs ability to run multiple processes to really load test the server.
>>
>>  Looking forward to your comments.
>>
>>  Thanks Mike
>
>
>
> --
> Iris Couch

Re: Gathering ideas for measuring and tuning performance

Posted by Jason Smith <jh...@iriscouch.com>.

Hi, Mike. This is excellent!

I hate to multiply the effort, but almost every bullet-point you list
could also be tested concurrently. In other words, concurrent
performance is not a line-item to test, it is a whole new category of
testing: view building with concurrent load, concurrent filtered
replication, concurrent updates, with erlang or JS validation
functions, etc.

That is clearly harder to test, and it's fine to make it out of scope
for now; however, **those** results are really the most useful when
deciding whether to bet the farm.

On Fri, Mar 23, 2012 at 7:54 AM, Mike Coolin <mc...@techie.com> wrote:
> Hi all,
>
>  I have been looking at couchdb for a couple of months now and really like what i see, in fact I've started developing my project using couchapp. However I do find some areas where I would like to better understand how well or badly couch performs.
>
>  So I am proposing starting a project that will implement some testing tools in erlang to test several aspects of couch, in my case I am looking for numbers for both the web server and the database.
>
>  Now before I get long winded with details I'm think of, let me state up front that I would like to gather your ideas on the types of tests that should be done.
>
>  I firmly believe that many applications will have differing usage profiles, what I want to try to capture is a number of common profiles, now you may ask what do I mean by a profile?
>
>  Profiles in this case refer to how the database is to be used, are we reading alot, writng alot, logging, highly concurrent, are we dealing with 5 concurrent session to 2000? Should we do test over 1 hour, a day or a week?
>
>  Ideally this tool would provide a means of testing the various operating system settings/erLang settings assisting production admins in configuring thier environments to perform optimally for them. It could also assist developers in testing code and tuning it for optimal performance and safety.
>
>  I see a number of areas that can be tested including:
>    *  Web server performance
>
>    *  File serving
>    *  Put/get/delete/post
>    *  load testing
>    *  Database performance
>
>    *  creating tables
>    *  populating tables
>    *  view performance - javascript - erlang
>    *  filter performance
>    *  insert with system keys and ideal keys
>    *  inserts with bulk inserts
>    *  database size
>    *  varying record sizes
>    *  concurrent user load testing
>    *  attachments processing
>    *  replication
>    *  database compaction
>    *  bulk inset/view response
>  Ideally I'd like to record the erlang/system setup and whether the database and test code were running on the same system or separate systems and result sets. In this way others would be able to evaluate couchdb with more realistic expectations and couchdb will have something solid to point to in terms of its performance.
>
>  I have not yet created the project on git, but that is my intension, but as I learned last night its nice to toss an idea out there, it's likely someone has some interesting things I could start with.
>
>  Please add any ideas/issues/objections to this email. If you like the idea and think it would be useful let me know, if not, let me know as well.
>
>  So far modeling something after nodeload was suggested, I briefly looked at it and yes it looks nice. But I'd like to take advantage of erlangs ability to run multiple processes to really load test the server.
>
>  Looking forward to your comments.
>
>  Thanks Mike



-- 
Iris Couch