You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Dave Amies <da...@gmail.com> on 2015/11/04 13:49:54 UTC

Re: CouchDB / NoSQL Benchmarking

Hi Garren,

Thanks for your kind words.

I will post each test result separately rather than
one enormous email, Unfortunately I lost the logs from the Couch DB crash.
I tried to reproduce it but instead the benchmark completed successfully,
so good news in a way.

Dave.


On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <ga...@apache.org> wrote:

> Hi Dave,
>
> This is very cool. Do you have the results and the scripts you used to
> benchmarch CouchDB?
>
> Cheers
> Garren
>
> On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <da...@gmail.com> wrote:
>
> > Hi All,
> >
> > I'm sure by now most of you will have read at least some parts of this
> > guide:
> >
> > http://guide.couchdb.org/draft/performance.html
> >
> > I was reading it the other day and noticed the "Call to Arms" section at
> > the bottom of the page. I don't know if there are already any
> benchmarking
> > tools out there, but I decided to try writing one. Hopefully the one I
> have
> > written will be useful.
> >
> > About my background, for my day job i am a performance tester, usually
> > specialising in Loadrunner, so this project was something to keep my mind
> > occupied while waiting for my test system to be rebuilt. Given this I
> have
> > only spent a few hours on it and so there is probably still room for
> > improvement, this email is about finding out if there is interest or if
> > this will be useful to the CouchDB community, so really should I continue
> > developing this tool, or am I wasting my time?
> >
> > In designing this benchmarking utility I reflected on all the systems I
> > have tested and tried to come up with some common areas where database
> > systems suffer in performance. Then bearing in mind the fundamental
> > differences between traditional databases and NoSQL databases
> (particularly
> > CouchDB) I tried to construct some some common database usage scenarios.
> >
> > The 3 scenarios I came up with are:
> >
> >    1. Write heavy (each user performs 12 writes, 6 reads and 3 searches /
> >    index queries)
> >    2. Index / Query / Search heavy (each user performs 1 write, 2 reads
> and
> >    6 searches / index queries)
> >    3. Read Heavy (each user performs 1 writes, 10 reads and 3 searches /
> >    index queries)
> >
> > I have tried out my benchmarking tool on a couple of machines so far, in
> > these tests I managed to cause CouchDB to encounter the following
> > situations:
> >
> >    1. Performance degradation due to being Disk IO bound
> >    2. Performance degradation due to being Memory bound
> >    3. Performance degradation due to being CPU bound
> >    4. Couch DB crashed
> >    5. Benchmarking completed successfully and produce a performance score
> >
> > Based on these results I believe I have created an effective tool for
> > benchmarking, so I decided the best next step was to release the tool as
> an
> > open source project, so I created a github project which can be found
> here:
> > https://github.com/damies13/kvbench. Here you will the readme file
> > describes the 3 scenarios in more detail, the benchmark definition or
> > design and also the pre benchmark data priming. You will also find here
> the
> > python script that is the benchmarking tool and some instructions for
> > setting up a couch db database for the benchmarking process.
> >
> > As this is getting long i'll wrap up by noting that I deliberately did
> not
> > use the python couchdb libraries but instead I used the requests library
> > (standard http) and json library because I wanted to keep the code as
> > generic as possible, the intention is that this benchmarking tool should
> be
> > able to be used to benchmarking any key / value store, whether that be a
> > document based NoSQL, and Key Value based NoSQL database or some other
> Rest
> > API / engine (e.g. backed by a traditional database).
> >
> > I look forward to some feed back, hopefully I have created something
> > useful.
> >
> > Sincerely,
> >
> > Dave.
> >
>

Re: CouchDB / NoSQL Benchmarking

Posted by Alexander Shorin <kx...@gmail.com>.

On Thu, Nov 5, 2015 at 12:50 AM, Dave Amies <da...@gmail.com> wrote:
> As a side note, anyone know if HTTP2 is on the roadmap for CouchDB?

Yes, but to have HTTP2 we'll need to replace MochiWeb with something
else (much likely cowboy) that supports it on good level. Replacing
HTTP engine is a big piece of work and we unlikely made it till 3.0.

--
,,,^..^,,,

RE: CouchDB / NoSQL Benchmarking

Posted by Craig Minihan <cr...@ripcordsoftware.com>.

Dave, answers:

"the database is stored in memory if the machine has enough memory" - not to my knowledge, CouchDB's small memory footprint shows that much - a 100GB DB does not equal 100GB RAM usage, also CouchDB map/reduce is always out of process unless Erlang is used

I'm aiming for 1TB heaps, the Erlang VM will not do that - and neither will the Java or .NET VMs - this is a custom piece of software designed for high performance

"how will you handle the situation where the db grows..." - you need a system with RAM > DB size, this is the same as DISK > DB size really

"better response times" - that is fairly easy to show, Futon will do that: I'm seeing view builds 300 times faster than CouchDB

"bulk update" - AFAIK the data is stored in a tree so insertion order isn't strictly important here, also updates are stored in batches unless you switch that off

Craig
 
-----Original Message-----
From: Dave Amies [mailto:damies13@gmail.com] 
Sent: 04 November 2015 21:51
To: user@couchdb.apache.org
Subject: Re: CouchDB / NoSQL Benchmarking

Hi Craig,

Some further thoughts:

"I'm developing an in-memory CouchDB API compatible DB"

Not wanting to put a dampener on this, but CouchDB already does this, the database is stored in memory if the machine has enough memory and the database is small enough to fit in the disk cache. Are you expecting a significant improvement accessing the data from memory directly rather than from the disk cache?

Also how will you handle the situation where the db grows to the point of being slightly larger than you memory, CouchDB handles this well, you just take a performance hit then until you are able to upgrade the memory of your database server.

I guess now you have a way to prove whether or not your CouchDB API compatible in-memory DB has better response times :)

If you are just just seeking better performance than a standard CouchDB and you already can code in Erlang, I would like to suggest you look into improving the way CouchDB handles updating views, from my initial tests even on a system that was Disk IO bound the read and write times are below
0.15 seconds, by comparison reading a view where the data had changed even when the entire database is in memory costs more than 40 seconds. To me this is the area of CouchDB where the greatest improvement is to be had.

With the read and write times below 0.15 seconds, I also think that network latency dominates this, but when HTTP2 is implemented in CouchDB this will mostly disappear.

As a side note, anyone know if HTTP2 is on the roadmap for CouchDB?
I couldn't find any information about this for CouchDB or MochiWeb. is the plan to wait for MochiWeb to implement it, or use a different Erlang web server that supports both HTTP1.1 and HTTP2?

Re the bulk update:

I guess this is the performance tester in me, but when you do a bulk load all the documents are stored contiguously, when you have 50 threads all doing single writes (as is the case here) then the documents are stored are not stored contiguously so this is more representative of  database that has lived in production and grown to size rather than a new fresh database.

If the database rearranges / optimises the storage of documents in it's storage medium (disk / memory/ other) then this would make no difference, if it doesn't then it make a difference. either way, by data priming in a constant way will highlight this by constantly providing better results for systems that optimise their Documents / KeyValue Pairs.

The more I think about this the better it is to leave it this way.

I must get to work...... so thats all for now

Dave.



On Thu, Nov 5, 2015 at 12:31 AM, Craig Minihan <cr...@ripcordsoftware.com>
wrote:

> Dave, I can't speak about other dbs but using single requests means 
> that network latency dominates your results and not db performance.
>
> I understand you want a level playing field for all dbs so you can 
> make a like for like comparison, but I'd expect any other serious db 
> will have a bulk API so there shouldn't be much of an issue in the longer term.
>
> You should find you are able to abstract this implementation detail 
> out of the code when you start to support other dbs.
>
> Craig
>
> -----Original Message-----
> From: Dave Amies [mailto:damies13@gmail.com]
> Sent: 04 November 2015 14:14
> To: user@couchdb.apache.org
> Subject: Re: CouchDB / NoSQL Benchmarking
>
> Hi Craig,
>
> Thanks for your comments, would you mind sharing your hardware details 
> and your benchmark results?
>
>
> here are my initial thoughts:
>
> * Convert StoreKV to use bulk POST - it is taking 10 mins to write the 
> initial 200k docs on my system, bulk should knock that down to a few 
> seconds
>
> I had initially thought about doing this, but decided against it 
> because I am trying to keep the code as generic as possible, so that 
> with minimal modifications it can be used with other NoSQL databases. 
> I didn't want to create the perception of giving one NoSQL database an 
> unfair advantage. I guess what I need to know is do all the other 
> NoSQL databases (or at least the Key Value Pair Store and Document 
> Store ones) all have this bulk load functionality via their REST API's?
>
> Note this is also the reason I used the python requests (generic http) 
> library and not the python couchdb library, even though the python 
> couchdb library would have been easier.
>
>
> * Set the content type in the put/post to 'application/json' - at the 
> mo it is blank This is probably a good idea, I will do this. I put 
> this benchmark tool together in some spare time I had so didn't expect 
> that I had it perfect first go. Besides i'm not the worlds best 
> programmer either
> :) the whole point of making it open source is so people can critique 
> and help improve the code, so thanks.
>
>
> * Add a script to initialize/reset the kvbench db
>
> Hmm, again I was trying to keep things generic, but as this is not 
> actually part of the benchmark but the setup steps I guess we could do 
> this. by the same token, I didn't see the need at the moment as 
> deleting a database in futon is a 2 click operation, and re-creating 
> the database and loading the design document in futon was not much 
> more, it didn't take very long to reset manually and any script to do 
> this would need to know the admin user's name and password (unless 
> admin party is on) and would then need to deal with the complexity of 
> prompting for the password or having it stored in the script (bad practice).
>
> Actually this can be scripted with 3 curl commands:
>
> curl -X DELETE http://127.0.0.1:5984/kvbench curl -X PUT 
> http://127.0.0.1:5984/kvbench curl -X PUT 
> http://127.0.0.1:5984/kvbench/_design/KVB -d '{  "_id":
> "_design/KVB",  "language": "javascript",  "views": {    "seconds": {
> "map": "function(doc) {\n  emit(doc.seconds.toFixed(0), doc.seconds);\n}",
>     "reduce": "_count"    },    "summary": {      "map": "function(doc) {\n
>  emit(doc.summary, 1);\n}",      "reduce": "_count"    }  } }'
>
> Naturally you will need to use -u in these curl commands if admin 
> party is disabled.
>
>
>
>
>
>
> On Wed, Nov 4, 2015 at 11:05 PM, Craig Minihan 
> <cr...@ripcordsoftware.com>
> wrote:
>
> > Dave, code looks very useful. I'm developing an in-memory CouchDB 
> > API compatible DB (https://github.com/RipcordSoftware/AvanceDB) so 
> > being able to x-ref performance with CouchDB would be very handy.
> >
> > I'd like to PR a few changes into the repo if you don't mind:
> > * Convert StoreKV to use bulk POST - it is taking 10 mins to write 
> > the initial 200k docs on my system, bulk should knock that down to a 
> > few seconds
> > * Set the content type in the put/post to 'application/json' - at 
> > the mo it is blank
> > * Add a script to initialize/reset the kvbench db
> >
> > I'm not a Python dev - apols in advance.
> >
> > Nice work!
> > Craig
> >
> > -----Original Message-----
> > From: Dave Amies [mailto:damies13@gmail.com]
> > Sent: 04 November 2015 12:50
> > To: user@couchdb.apache.org
> > Subject: Re: CouchDB / NoSQL Benchmarking
> >
> > Hi Garren,
> >
> > Thanks for your kind words.
> >
> > I will post each test result separately rather than one enormous 
> > email, Unfortunately I lost the logs from the Couch DB crash.
> > I tried to reproduce it but instead the benchmark completed 
> > successfully, so good news in a way.
> >
> > Dave.
> >
> >
> > On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <ga...@apache.org> wrote:
> >
> > > Hi Dave,
> > >
> > > This is very cool. Do you have the results and the scripts you 
> > > used to benchmarch CouchDB?
> > >
> > > Cheers
> > > Garren
> > >
> > > On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <da...@gmail.com>
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm sure by now most of you will have read at least some parts 
> > > > of this
> > > > guide:
> > > >
> > > > http://guide.couchdb.org/draft/performance.html
> > > >
> > > > I was reading it the other day and noticed the "Call to Arms"
> > > > section at the bottom of the page. I don't know if there are 
> > > > already any
> > > benchmarking
> > > > tools out there, but I decided to try writing one. Hopefully the 
> > > > one I
> > > have
> > > > written will be useful.
> > > >
> > > > About my background, for my day job i am a performance tester, 
> > > > usually specialising in Loadrunner, so this project was 
> > > > something to keep my mind occupied while waiting for my test 
> > > > system to be rebuilt. Given this I
> > > have
> > > > only spent a few hours on it and so there is probably still room 
> > > > for improvement, this email is about finding out if there is 
> > > > interest or if this will be useful to the CouchDB community, so 
> > > > really should I continue developing this tool, or am I wasting 
> > > > my
> time?
> > > >
> > > > In designing this benchmarking utility I reflected on all the 
> > > > systems I have tested and tried to come up with some common 
> > > > areas where database systems suffer in performance. Then bearing 
> > > > in mind the fundamental differences between traditional 
> > > > databases and NoSQL databases
> > > (particularly
> > > > CouchDB) I tried to construct some some common database usage
> > scenarios.
> > > >
> > > > The 3 scenarios I came up with are:
> > > >
> > > >    1. Write heavy (each user performs 12 writes, 6 reads and 3
> > searches /
> > > >    index queries)
> > > >    2. Index / Query / Search heavy (each user performs 1 write, 
> > > > 2 reads
> > > and
> > > >    6 searches / index queries)
> > > >    3. Read Heavy (each user performs 1 writes, 10 reads and 3 
> > > > searches
> > /
> > > >    index queries)
> > > >
> > > > I have tried out my benchmarking tool on a couple of machines so 
> > > > far, in these tests I managed to cause CouchDB to encounter the 
> > > > following
> > > > situations:
> > > >
> > > >    1. Performance degradation due to being Disk IO bound
> > > >    2. Performance degradation due to being Memory bound
> > > >    3. Performance degradation due to being CPU bound
> > > >    4. Couch DB crashed
> > > >    5. Benchmarking completed successfully and produce a 
> > > > performance score
> > > >
> > > > Based on these results I believe I have created an effective 
> > > > tool for benchmarking, so I decided the best next step was to 
> > > > release the tool as
> > > an
> > > > open source project, so I created a github project which can be 
> > > > found
> > > here:
> > > > https://github.com/damies13/kvbench. Here you will the readme 
> > > > file describes the 3 scenarios in more detail, the benchmark 
> > > > definition or design and also the pre benchmark data priming. 
> > > > You will also find here
> > > the
> > > > python script that is the benchmarking tool and some 
> > > > instructions for setting up a couch db database for the benchmarking process.
> > > >
> > > > As this is getting long i'll wrap up by noting that I 
> > > > deliberately did
> > > not
> > > > use the python couchdb libraries but instead I used the requests 
> > > > library (standard http) and json library because I wanted to 
> > > > keep the code as generic as possible, the intention is that this 
> > > > benchmarking tool should
> > > be
> > > > able to be used to benchmarking any key / value store, whether 
> > > > that be a document based NoSQL, and Key Value based NoSQL 
> > > > database or some other
> > > Rest
> > > > API / engine (e.g. backed by a traditional database).
> > > >
> > > > I look forward to some feed back, hopefully I have created 
> > > > something useful.
> > > >
> > > > Sincerely,
> > > >
> > > > Dave.
> > > >
> > >
> >
>

Re: CouchDB / NoSQL Benchmarking

Posted by Dave Amies <da...@gmail.com>.

Hi Craig,

Some further thoughts:

"I'm developing an in-memory CouchDB API compatible DB"

Not wanting to put a dampener on this, but CouchDB already does this, the
database is stored in memory if the machine has enough memory and the
database is small enough to fit in the disk cache. Are you expecting a
significant improvement accessing the data from memory directly rather than
from the disk cache?

Also how will you handle the situation where the db grows to the point of
being slightly larger than you memory, CouchDB handles this well, you just
take a performance hit then until you are able to upgrade the memory of
your database server.

I guess now you have a way to prove whether or not your CouchDB API
compatible in-memory DB has better response times :)

If you are just just seeking better performance than a standard CouchDB and
you already can code in Erlang, I would like to suggest you look into
improving the way CouchDB handles updating views, from my initial tests
even on a system that was Disk IO bound the read and write times are below
0.15 seconds, by comparison reading a view where the data had changed even
when the entire database is in memory costs more than 40 seconds. To me
this is the area of CouchDB where the greatest improvement is to be had.

With the read and write times below 0.15 seconds, I also think that network
latency dominates this, but when HTTP2 is implemented in CouchDB this will
mostly disappear.

As a side note, anyone know if HTTP2 is on the roadmap for CouchDB?
I couldn't find any information about this for CouchDB or MochiWeb. is the
plan to wait for MochiWeb to implement it, or use a different Erlang web
server that supports both HTTP1.1 and HTTP2?

Re the bulk update:

I guess this is the performance tester in me, but when you do a bulk load
all the documents are stored contiguously, when you have 50 threads all
doing single writes (as is the case here) then the documents are stored are
not stored contiguously so this is more representative of  database that
has lived in production and grown to size rather than a new fresh database.

If the database rearranges / optimises the storage of documents in it's
storage medium (disk / memory/ other) then this would make no difference,
if it doesn't then it make a difference. either way, by data priming in a
constant way will highlight this by constantly providing better results for
systems that optimise their Documents / KeyValue Pairs.

The more I think about this the better it is to leave it this way.

I must get to work...... so thats all for now

Dave.



On Thu, Nov 5, 2015 at 12:31 AM, Craig Minihan <cr...@ripcordsoftware.com>
wrote:

> Dave, I can't speak about other dbs but using single requests means that
> network latency dominates your results and not db performance.
>
> I understand you want a level playing field for all dbs so you can make a
> like for like comparison, but I'd expect any other serious db will have a
> bulk API so there shouldn't be much of an issue in the longer term.
>
> You should find you are able to abstract this implementation detail out of
> the code when you start to support other dbs.
>
> Craig
>
> -----Original Message-----
> From: Dave Amies [mailto:damies13@gmail.com]
> Sent: 04 November 2015 14:14
> To: user@couchdb.apache.org
> Subject: Re: CouchDB / NoSQL Benchmarking
>
> Hi Craig,
>
> Thanks for your comments, would you mind sharing your hardware details and
> your benchmark results?
>
>
> here are my initial thoughts:
>
> * Convert StoreKV to use bulk POST - it is taking 10 mins to write the
> initial 200k docs on my system, bulk should knock that down to a few seconds
>
> I had initially thought about doing this, but decided against it because I
> am trying to keep the code as generic as possible, so that with minimal
> modifications it can be used with other NoSQL databases. I didn't want to
> create the perception of giving one NoSQL database an unfair advantage. I
> guess what I need to know is do all the other NoSQL databases (or at least
> the Key Value Pair Store and Document Store ones) all have this bulk load
> functionality via their REST API's?
>
> Note this is also the reason I used the python requests (generic http)
> library and not the python couchdb library, even though the python couchdb
> library would have been easier.
>
>
> * Set the content type in the put/post to 'application/json' - at the mo
> it is blank This is probably a good idea, I will do this. I put this
> benchmark tool together in some spare time I had so didn't expect that I
> had it perfect first go. Besides i'm not the worlds best programmer either
> :) the whole point of making it open source is so people can critique and
> help improve the code, so thanks.
>
>
> * Add a script to initialize/reset the kvbench db
>
> Hmm, again I was trying to keep things generic, but as this is not
> actually part of the benchmark but the setup steps I guess we could do
> this. by the same token, I didn't see the need at the moment as deleting a
> database in futon is a 2 click operation, and re-creating the database and
> loading the design document in futon was not much more, it didn't take very
> long to reset manually and any script to do this would need to know the
> admin user's name and password (unless admin party is on) and would then
> need to deal with the complexity of prompting for the password or having it
> stored in the script (bad practice).
>
> Actually this can be scripted with 3 curl commands:
>
> curl -X DELETE http://127.0.0.1:5984/kvbench curl -X PUT
> http://127.0.0.1:5984/kvbench curl -X PUT
> http://127.0.0.1:5984/kvbench/_design/KVB -d '{  "_id":
> "_design/KVB",  "language": "javascript",  "views": {    "seconds": {
> "map": "function(doc) {\n  emit(doc.seconds.toFixed(0), doc.seconds);\n}",
>     "reduce": "_count"    },    "summary": {      "map": "function(doc) {\n
>  emit(doc.summary, 1);\n}",      "reduce": "_count"    }  } }'
>
> Naturally you will need to use -u in these curl commands if admin party is
> disabled.
>
>
>
>
>
>
> On Wed, Nov 4, 2015 at 11:05 PM, Craig Minihan <cr...@ripcordsoftware.com>
> wrote:
>
> > Dave, code looks very useful. I'm developing an in-memory CouchDB API
> > compatible DB (https://github.com/RipcordSoftware/AvanceDB) so being
> > able to x-ref performance with CouchDB would be very handy.
> >
> > I'd like to PR a few changes into the repo if you don't mind:
> > * Convert StoreKV to use bulk POST - it is taking 10 mins to write the
> > initial 200k docs on my system, bulk should knock that down to a few
> > seconds
> > * Set the content type in the put/post to 'application/json' - at the
> > mo it is blank
> > * Add a script to initialize/reset the kvbench db
> >
> > I'm not a Python dev - apols in advance.
> >
> > Nice work!
> > Craig
> >
> > -----Original Message-----
> > From: Dave Amies [mailto:damies13@gmail.com]
> > Sent: 04 November 2015 12:50
> > To: user@couchdb.apache.org
> > Subject: Re: CouchDB / NoSQL Benchmarking
> >
> > Hi Garren,
> >
> > Thanks for your kind words.
> >
> > I will post each test result separately rather than one enormous
> > email, Unfortunately I lost the logs from the Couch DB crash.
> > I tried to reproduce it but instead the benchmark completed
> > successfully, so good news in a way.
> >
> > Dave.
> >
> >
> > On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <ga...@apache.org> wrote:
> >
> > > Hi Dave,
> > >
> > > This is very cool. Do you have the results and the scripts you used
> > > to benchmarch CouchDB?
> > >
> > > Cheers
> > > Garren
> > >
> > > On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <da...@gmail.com>
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I'm sure by now most of you will have read at least some parts of
> > > > this
> > > > guide:
> > > >
> > > > http://guide.couchdb.org/draft/performance.html
> > > >
> > > > I was reading it the other day and noticed the "Call to Arms"
> > > > section at the bottom of the page. I don't know if there are
> > > > already any
> > > benchmarking
> > > > tools out there, but I decided to try writing one. Hopefully the
> > > > one I
> > > have
> > > > written will be useful.
> > > >
> > > > About my background, for my day job i am a performance tester,
> > > > usually specialising in Loadrunner, so this project was something
> > > > to keep my mind occupied while waiting for my test system to be
> > > > rebuilt. Given this I
> > > have
> > > > only spent a few hours on it and so there is probably still room
> > > > for improvement, this email is about finding out if there is
> > > > interest or if this will be useful to the CouchDB community, so
> > > > really should I continue developing this tool, or am I wasting my
> time?
> > > >
> > > > In designing this benchmarking utility I reflected on all the
> > > > systems I have tested and tried to come up with some common areas
> > > > where database systems suffer in performance. Then bearing in mind
> > > > the fundamental differences between traditional databases and
> > > > NoSQL databases
> > > (particularly
> > > > CouchDB) I tried to construct some some common database usage
> > scenarios.
> > > >
> > > > The 3 scenarios I came up with are:
> > > >
> > > >    1. Write heavy (each user performs 12 writes, 6 reads and 3
> > searches /
> > > >    index queries)
> > > >    2. Index / Query / Search heavy (each user performs 1 write, 2
> > > > reads
> > > and
> > > >    6 searches / index queries)
> > > >    3. Read Heavy (each user performs 1 writes, 10 reads and 3
> > > > searches
> > /
> > > >    index queries)
> > > >
> > > > I have tried out my benchmarking tool on a couple of machines so
> > > > far, in these tests I managed to cause CouchDB to encounter the
> > > > following
> > > > situations:
> > > >
> > > >    1. Performance degradation due to being Disk IO bound
> > > >    2. Performance degradation due to being Memory bound
> > > >    3. Performance degradation due to being CPU bound
> > > >    4. Couch DB crashed
> > > >    5. Benchmarking completed successfully and produce a
> > > > performance score
> > > >
> > > > Based on these results I believe I have created an effective tool
> > > > for benchmarking, so I decided the best next step was to release
> > > > the tool as
> > > an
> > > > open source project, so I created a github project which can be
> > > > found
> > > here:
> > > > https://github.com/damies13/kvbench. Here you will the readme file
> > > > describes the 3 scenarios in more detail, the benchmark definition
> > > > or design and also the pre benchmark data priming. You will also
> > > > find here
> > > the
> > > > python script that is the benchmarking tool and some instructions
> > > > for setting up a couch db database for the benchmarking process.
> > > >
> > > > As this is getting long i'll wrap up by noting that I deliberately
> > > > did
> > > not
> > > > use the python couchdb libraries but instead I used the requests
> > > > library (standard http) and json library because I wanted to keep
> > > > the code as generic as possible, the intention is that this
> > > > benchmarking tool should
> > > be
> > > > able to be used to benchmarking any key / value store, whether
> > > > that be a document based NoSQL, and Key Value based NoSQL database
> > > > or some other
> > > Rest
> > > > API / engine (e.g. backed by a traditional database).
> > > >
> > > > I look forward to some feed back, hopefully I have created
> > > > something useful.
> > > >
> > > > Sincerely,
> > > >
> > > > Dave.
> > > >
> > >
> >
>

RE: CouchDB / NoSQL Benchmarking

Posted by Craig Minihan <cr...@ripcordsoftware.com>.

Dave, I can't speak about other dbs but using single requests means that network latency dominates your results and not db performance. 

I understand you want a level playing field for all dbs so you can make a like for like comparison, but I'd expect any other serious db will have a bulk API so there shouldn't be much of an issue in the longer term. 

You should find you are able to abstract this implementation detail out of the code when you start to support other dbs.

Craig

-----Original Message-----
From: Dave Amies [mailto:damies13@gmail.com] 
Sent: 04 November 2015 14:14
To: user@couchdb.apache.org
Subject: Re: CouchDB / NoSQL Benchmarking

Hi Craig,

Thanks for your comments, would you mind sharing your hardware details and your benchmark results?


here are my initial thoughts:

* Convert StoreKV to use bulk POST - it is taking 10 mins to write the initial 200k docs on my system, bulk should knock that down to a few seconds

I had initially thought about doing this, but decided against it because I am trying to keep the code as generic as possible, so that with minimal modifications it can be used with other NoSQL databases. I didn't want to create the perception of giving one NoSQL database an unfair advantage. I guess what I need to know is do all the other NoSQL databases (or at least the Key Value Pair Store and Document Store ones) all have this bulk load functionality via their REST API's?

Note this is also the reason I used the python requests (generic http) library and not the python couchdb library, even though the python couchdb library would have been easier.


* Set the content type in the put/post to 'application/json' - at the mo it is blank This is probably a good idea, I will do this. I put this benchmark tool together in some spare time I had so didn't expect that I had it perfect first go. Besides i'm not the worlds best programmer either :) the whole point of making it open source is so people can critique and help improve the code, so thanks.


* Add a script to initialize/reset the kvbench db

Hmm, again I was trying to keep things generic, but as this is not actually part of the benchmark but the setup steps I guess we could do this. by the same token, I didn't see the need at the moment as deleting a database in futon is a 2 click operation, and re-creating the database and loading the design document in futon was not much more, it didn't take very long to reset manually and any script to do this would need to know the admin user's name and password (unless admin party is on) and would then need to deal with the complexity of prompting for the password or having it stored in the script (bad practice).

Actually this can be scripted with 3 curl commands:

curl -X DELETE http://127.0.0.1:5984/kvbench curl -X PUT http://127.0.0.1:5984/kvbench curl -X PUT http://127.0.0.1:5984/kvbench/_design/KVB -d '{  "_id":
"_design/KVB",  "language": "javascript",  "views": {    "seconds": {
"map": "function(doc) {\n  emit(doc.seconds.toFixed(0), doc.seconds);\n}",
    "reduce": "_count"    },    "summary": {      "map": "function(doc) {\n
 emit(doc.summary, 1);\n}",      "reduce": "_count"    }  } }'

Naturally you will need to use -u in these curl commands if admin party is disabled.






On Wed, Nov 4, 2015 at 11:05 PM, Craig Minihan <cr...@ripcordsoftware.com>
wrote:

> Dave, code looks very useful. I'm developing an in-memory CouchDB API 
> compatible DB (https://github.com/RipcordSoftware/AvanceDB) so being 
> able to x-ref performance with CouchDB would be very handy.
>
> I'd like to PR a few changes into the repo if you don't mind:
> * Convert StoreKV to use bulk POST - it is taking 10 mins to write the 
> initial 200k docs on my system, bulk should knock that down to a few 
> seconds
> * Set the content type in the put/post to 'application/json' - at the 
> mo it is blank
> * Add a script to initialize/reset the kvbench db
>
> I'm not a Python dev - apols in advance.
>
> Nice work!
> Craig
>
> -----Original Message-----
> From: Dave Amies [mailto:damies13@gmail.com]
> Sent: 04 November 2015 12:50
> To: user@couchdb.apache.org
> Subject: Re: CouchDB / NoSQL Benchmarking
>
> Hi Garren,
>
> Thanks for your kind words.
>
> I will post each test result separately rather than one enormous 
> email, Unfortunately I lost the logs from the Couch DB crash.
> I tried to reproduce it but instead the benchmark completed 
> successfully, so good news in a way.
>
> Dave.
>
>
> On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <ga...@apache.org> wrote:
>
> > Hi Dave,
> >
> > This is very cool. Do you have the results and the scripts you used 
> > to benchmarch CouchDB?
> >
> > Cheers
> > Garren
> >
> > On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <da...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I'm sure by now most of you will have read at least some parts of 
> > > this
> > > guide:
> > >
> > > http://guide.couchdb.org/draft/performance.html
> > >
> > > I was reading it the other day and noticed the "Call to Arms"
> > > section at the bottom of the page. I don't know if there are 
> > > already any
> > benchmarking
> > > tools out there, but I decided to try writing one. Hopefully the 
> > > one I
> > have
> > > written will be useful.
> > >
> > > About my background, for my day job i am a performance tester, 
> > > usually specialising in Loadrunner, so this project was something 
> > > to keep my mind occupied while waiting for my test system to be 
> > > rebuilt. Given this I
> > have
> > > only spent a few hours on it and so there is probably still room 
> > > for improvement, this email is about finding out if there is 
> > > interest or if this will be useful to the CouchDB community, so 
> > > really should I continue developing this tool, or am I wasting my time?
> > >
> > > In designing this benchmarking utility I reflected on all the 
> > > systems I have tested and tried to come up with some common areas 
> > > where database systems suffer in performance. Then bearing in mind 
> > > the fundamental differences between traditional databases and 
> > > NoSQL databases
> > (particularly
> > > CouchDB) I tried to construct some some common database usage
> scenarios.
> > >
> > > The 3 scenarios I came up with are:
> > >
> > >    1. Write heavy (each user performs 12 writes, 6 reads and 3
> searches /
> > >    index queries)
> > >    2. Index / Query / Search heavy (each user performs 1 write, 2 
> > > reads
> > and
> > >    6 searches / index queries)
> > >    3. Read Heavy (each user performs 1 writes, 10 reads and 3 
> > > searches
> /
> > >    index queries)
> > >
> > > I have tried out my benchmarking tool on a couple of machines so 
> > > far, in these tests I managed to cause CouchDB to encounter the 
> > > following
> > > situations:
> > >
> > >    1. Performance degradation due to being Disk IO bound
> > >    2. Performance degradation due to being Memory bound
> > >    3. Performance degradation due to being CPU bound
> > >    4. Couch DB crashed
> > >    5. Benchmarking completed successfully and produce a 
> > > performance score
> > >
> > > Based on these results I believe I have created an effective tool 
> > > for benchmarking, so I decided the best next step was to release 
> > > the tool as
> > an
> > > open source project, so I created a github project which can be 
> > > found
> > here:
> > > https://github.com/damies13/kvbench. Here you will the readme file 
> > > describes the 3 scenarios in more detail, the benchmark definition 
> > > or design and also the pre benchmark data priming. You will also 
> > > find here
> > the
> > > python script that is the benchmarking tool and some instructions 
> > > for setting up a couch db database for the benchmarking process.
> > >
> > > As this is getting long i'll wrap up by noting that I deliberately 
> > > did
> > not
> > > use the python couchdb libraries but instead I used the requests 
> > > library (standard http) and json library because I wanted to keep 
> > > the code as generic as possible, the intention is that this 
> > > benchmarking tool should
> > be
> > > able to be used to benchmarking any key / value store, whether 
> > > that be a document based NoSQL, and Key Value based NoSQL database 
> > > or some other
> > Rest
> > > API / engine (e.g. backed by a traditional database).
> > >
> > > I look forward to some feed back, hopefully I have created 
> > > something useful.
> > >
> > > Sincerely,
> > >
> > > Dave.
> > >
> >
>

Re: CouchDB / NoSQL Benchmarking

Posted by Dave Amies <da...@gmail.com>.

Hi Craig,

Thanks for your comments, would you mind sharing your hardware details and
your benchmark results?


here are my initial thoughts:

* Convert StoreKV to use bulk POST - it is taking 10 mins to write the
initial 200k docs on my system, bulk should knock that down to a few seconds

I had initially thought about doing this, but decided against it because I
am trying to keep the code as generic as possible, so that with minimal
modifications it can be used with other NoSQL databases. I didn't want to
create the perception of giving one NoSQL database an unfair advantage. I
guess what I need to know is do all the other NoSQL databases (or at least
the Key Value Pair Store and Document Store ones) all have this bulk load
functionality via their REST API's?

Note this is also the reason I used the python requests (generic http) library
and not the python couchdb library, even though the python couchdb library
would have been easier.


* Set the content type in the put/post to 'application/json' - at the mo it
is blank
This is probably a good idea, I will do this. I put this benchmark tool
together in some spare time I had so didn't expect that I had it perfect
first go. Besides i'm not the worlds best programmer either :) the whole
point of making it open source is so people can critique and help improve
the code, so thanks.


* Add a script to initialize/reset the kvbench db

Hmm, again I was trying to keep things generic, but as this is not actually
part of the benchmark but the setup steps I guess we could do this. by the
same token, I didn't see the need at the moment as deleting a database in
futon is a 2 click operation, and re-creating the database and loading the
design document in futon was not much more, it didn't take very long to
reset manually and any script to do this would need to know the admin
user's name and password (unless admin party is on) and would then need to
deal with the complexity of prompting for the password or having it stored
in the script (bad practice).

Actually this can be scripted with 3 curl commands:

curl -X DELETE http://127.0.0.1:5984/kvbench
curl -X PUT http://127.0.0.1:5984/kvbench
curl -X PUT http://127.0.0.1:5984/kvbench/_design/KVB -d '{  "_id":
"_design/KVB",  "language": "javascript",  "views": {    "seconds": {
"map": "function(doc) {\n  emit(doc.seconds.toFixed(0), doc.seconds);\n}",
    "reduce": "_count"    },    "summary": {      "map": "function(doc) {\n
 emit(doc.summary, 1);\n}",      "reduce": "_count"    }  } }'

Naturally you will need to use -u in these curl commands if admin party is
disabled.






On Wed, Nov 4, 2015 at 11:05 PM, Craig Minihan <cr...@ripcordsoftware.com>
wrote:

> Dave, code looks very useful. I'm developing an in-memory CouchDB API
> compatible DB (https://github.com/RipcordSoftware/AvanceDB) so being able
> to x-ref performance with CouchDB would be very handy.
>
> I'd like to PR a few changes into the repo if you don't mind:
> * Convert StoreKV to use bulk POST - it is taking 10 mins to write the
> initial 200k docs on my system, bulk should knock that down to a few seconds
> * Set the content type in the put/post to 'application/json' - at the mo
> it is blank
> * Add a script to initialize/reset the kvbench db
>
> I'm not a Python dev - apols in advance.
>
> Nice work!
> Craig
>
> -----Original Message-----
> From: Dave Amies [mailto:damies13@gmail.com]
> Sent: 04 November 2015 12:50
> To: user@couchdb.apache.org
> Subject: Re: CouchDB / NoSQL Benchmarking
>
> Hi Garren,
>
> Thanks for your kind words.
>
> I will post each test result separately rather than one enormous email,
> Unfortunately I lost the logs from the Couch DB crash.
> I tried to reproduce it but instead the benchmark completed successfully,
> so good news in a way.
>
> Dave.
>
>
> On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <ga...@apache.org> wrote:
>
> > Hi Dave,
> >
> > This is very cool. Do you have the results and the scripts you used to
> > benchmarch CouchDB?
> >
> > Cheers
> > Garren
> >
> > On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <da...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I'm sure by now most of you will have read at least some parts of
> > > this
> > > guide:
> > >
> > > http://guide.couchdb.org/draft/performance.html
> > >
> > > I was reading it the other day and noticed the "Call to Arms"
> > > section at the bottom of the page. I don't know if there are already
> > > any
> > benchmarking
> > > tools out there, but I decided to try writing one. Hopefully the one
> > > I
> > have
> > > written will be useful.
> > >
> > > About my background, for my day job i am a performance tester,
> > > usually specialising in Loadrunner, so this project was something to
> > > keep my mind occupied while waiting for my test system to be
> > > rebuilt. Given this I
> > have
> > > only spent a few hours on it and so there is probably still room for
> > > improvement, this email is about finding out if there is interest or
> > > if this will be useful to the CouchDB community, so really should I
> > > continue developing this tool, or am I wasting my time?
> > >
> > > In designing this benchmarking utility I reflected on all the
> > > systems I have tested and tried to come up with some common areas
> > > where database systems suffer in performance. Then bearing in mind
> > > the fundamental differences between traditional databases and NoSQL
> > > databases
> > (particularly
> > > CouchDB) I tried to construct some some common database usage
> scenarios.
> > >
> > > The 3 scenarios I came up with are:
> > >
> > >    1. Write heavy (each user performs 12 writes, 6 reads and 3
> searches /
> > >    index queries)
> > >    2. Index / Query / Search heavy (each user performs 1 write, 2
> > > reads
> > and
> > >    6 searches / index queries)
> > >    3. Read Heavy (each user performs 1 writes, 10 reads and 3 searches
> /
> > >    index queries)
> > >
> > > I have tried out my benchmarking tool on a couple of machines so
> > > far, in these tests I managed to cause CouchDB to encounter the
> > > following
> > > situations:
> > >
> > >    1. Performance degradation due to being Disk IO bound
> > >    2. Performance degradation due to being Memory bound
> > >    3. Performance degradation due to being CPU bound
> > >    4. Couch DB crashed
> > >    5. Benchmarking completed successfully and produce a performance
> > > score
> > >
> > > Based on these results I believe I have created an effective tool
> > > for benchmarking, so I decided the best next step was to release the
> > > tool as
> > an
> > > open source project, so I created a github project which can be
> > > found
> > here:
> > > https://github.com/damies13/kvbench. Here you will the readme file
> > > describes the 3 scenarios in more detail, the benchmark definition
> > > or design and also the pre benchmark data priming. You will also
> > > find here
> > the
> > > python script that is the benchmarking tool and some instructions
> > > for setting up a couch db database for the benchmarking process.
> > >
> > > As this is getting long i'll wrap up by noting that I deliberately
> > > did
> > not
> > > use the python couchdb libraries but instead I used the requests
> > > library (standard http) and json library because I wanted to keep
> > > the code as generic as possible, the intention is that this
> > > benchmarking tool should
> > be
> > > able to be used to benchmarking any key / value store, whether that
> > > be a document based NoSQL, and Key Value based NoSQL database or
> > > some other
> > Rest
> > > API / engine (e.g. backed by a traditional database).
> > >
> > > I look forward to some feed back, hopefully I have created something
> > > useful.
> > >
> > > Sincerely,
> > >
> > > Dave.
> > >
> >
>

RE: CouchDB / NoSQL Benchmarking

Posted by Craig Minihan <cr...@ripcordsoftware.com>.

Dave, code looks very useful. I'm developing an in-memory CouchDB API compatible DB (https://github.com/RipcordSoftware/AvanceDB) so being able to x-ref performance with CouchDB would be very handy.

I'd like to PR a few changes into the repo if you don't mind:
* Convert StoreKV to use bulk POST - it is taking 10 mins to write the initial 200k docs on my system, bulk should knock that down to a few seconds
* Set the content type in the put/post to 'application/json' - at the mo it is blank
* Add a script to initialize/reset the kvbench db

I'm not a Python dev - apols in advance.

Nice work!
Craig

-----Original Message-----
From: Dave Amies [mailto:damies13@gmail.com] 
Sent: 04 November 2015 12:50
To: user@couchdb.apache.org
Subject: Re: CouchDB / NoSQL Benchmarking

Hi Garren,

Thanks for your kind words.

I will post each test result separately rather than one enormous email, Unfortunately I lost the logs from the Couch DB crash.
I tried to reproduce it but instead the benchmark completed successfully, so good news in a way.

Dave.


On Tue, Oct 27, 2015 at 8:18 PM, Garren Smith <ga...@apache.org> wrote:

> Hi Dave,
>
> This is very cool. Do you have the results and the scripts you used to 
> benchmarch CouchDB?
>
> Cheers
> Garren
>
> On Thu, Oct 22, 2015 at 3:13 PM, Dave Amies <da...@gmail.com> wrote:
>
> > Hi All,
> >
> > I'm sure by now most of you will have read at least some parts of 
> > this
> > guide:
> >
> > http://guide.couchdb.org/draft/performance.html
> >
> > I was reading it the other day and noticed the "Call to Arms" 
> > section at the bottom of the page. I don't know if there are already 
> > any
> benchmarking
> > tools out there, but I decided to try writing one. Hopefully the one 
> > I
> have
> > written will be useful.
> >
> > About my background, for my day job i am a performance tester, 
> > usually specialising in Loadrunner, so this project was something to 
> > keep my mind occupied while waiting for my test system to be 
> > rebuilt. Given this I
> have
> > only spent a few hours on it and so there is probably still room for 
> > improvement, this email is about finding out if there is interest or 
> > if this will be useful to the CouchDB community, so really should I 
> > continue developing this tool, or am I wasting my time?
> >
> > In designing this benchmarking utility I reflected on all the 
> > systems I have tested and tried to come up with some common areas 
> > where database systems suffer in performance. Then bearing in mind 
> > the fundamental differences between traditional databases and NoSQL 
> > databases
> (particularly
> > CouchDB) I tried to construct some some common database usage scenarios.
> >
> > The 3 scenarios I came up with are:
> >
> >    1. Write heavy (each user performs 12 writes, 6 reads and 3 searches /
> >    index queries)
> >    2. Index / Query / Search heavy (each user performs 1 write, 2 
> > reads
> and
> >    6 searches / index queries)
> >    3. Read Heavy (each user performs 1 writes, 10 reads and 3 searches /
> >    index queries)
> >
> > I have tried out my benchmarking tool on a couple of machines so 
> > far, in these tests I managed to cause CouchDB to encounter the 
> > following
> > situations:
> >
> >    1. Performance degradation due to being Disk IO bound
> >    2. Performance degradation due to being Memory bound
> >    3. Performance degradation due to being CPU bound
> >    4. Couch DB crashed
> >    5. Benchmarking completed successfully and produce a performance 
> > score
> >
> > Based on these results I believe I have created an effective tool 
> > for benchmarking, so I decided the best next step was to release the 
> > tool as
> an
> > open source project, so I created a github project which can be 
> > found
> here:
> > https://github.com/damies13/kvbench. Here you will the readme file 
> > describes the 3 scenarios in more detail, the benchmark definition 
> > or design and also the pre benchmark data priming. You will also 
> > find here
> the
> > python script that is the benchmarking tool and some instructions 
> > for setting up a couch db database for the benchmarking process.
> >
> > As this is getting long i'll wrap up by noting that I deliberately 
> > did
> not
> > use the python couchdb libraries but instead I used the requests 
> > library (standard http) and json library because I wanted to keep 
> > the code as generic as possible, the intention is that this 
> > benchmarking tool should
> be
> > able to be used to benchmarking any key / value store, whether that 
> > be a document based NoSQL, and Key Value based NoSQL database or 
> > some other
> Rest
> > API / engine (e.g. backed by a traditional database).
> >
> > I look forward to some feed back, hopefully I have created something 
> > useful.
> >
> > Sincerely,
> >
> > Dave.
> >
>