You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Mark Hahn <ma...@hahnca.com> on 2013/01/16 21:17:06 UTC

general question about couch performance

My couchdb is seeing a typical request rate of about 100/sec when it is
maxed out.  This is typically 10 reads/write.  This is disappointing.  I
was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers are
others seeing?

I have 35 views with only 50 to 100 entries per view.  My db is less than a
gigabyte with a few thousand active docs.

I'm running on a medium ec2 instance with ephemeral disk.  I assume I am IO
bound as the cpu is not maxing out.

How much worse would this get if the db also had to handle replication
between multiple servers?

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

Thanks.

> Now for IDs I thought this API [1] was for retrieving sequential UUIDs

I just read your reference [1].  I was surprised to read that what is
labelled "sequential" is only partially sequential.  The prefix is random.
 So it is a UUID after all, not a simple sequence . So using them on
multiple doc generating servers is not a problem.  I'll have to consider
this.

Now my only problem is that currently my app takes advantage of the
utc_random algorithm.  I use the id to keep docs sequenced in time and I
even pull the creation time out of the ID.  In retrospect that was pretty
stupid.

Re: general question about couch performance

Posted by Sean Copenhaver <se...@gmail.com>.

I'm going to assume that you have compacted and CouchDB isn't scrolling through a gig file trying to access docs all over the place.

Would need to know what's size your documents are and how much I/O utilization you are seeing on your server, but if you are using non-sequential IDs that'll cause write slow downs as CouchDB will have to keep rewriting b-tree nodes and such instead of appending new ones.  

Now for IDs I thought this API [1] was for retrieving sequential UUIDs (sounds like at least you can configure it that way) which will help nullify the concern above. Also your data set sounds small enough you might not care about trying to pack in a smaller sequential ID.

Also another thing to help with write performance is configuring the 'delayed_commits' [2].  That will always store up documents from individual writes and save them at once to help with write speed. This of course increases the timeframe for data loss if something really bad were to happen.

Only other thing I can think of off the top of my head CouchDB related to try is make sure compression is on since that'll help with both reads and writes. CouchDB 1.2 [3] added it and it's on by default but if you haven't upgraded I believe you have to do a compaction once you upgraded the server.

Maybe try tweaking OS file system stuff. CouchDB does not do caching I'm aware of relying on the file system to do that.

I'll leave with a disclosure that I haven't done anything CouchDB related in while and I never had a project where its performance caused me to investigate. 

[1] http://wiki.apache.org/couchdb/HttpGetUuids
[2] http://wiki.apache.org/couchdb/Configurationfile_couch.ini
[3] http://couchdb.readthedocs.org/en/latest/changelog/#id4

-- 
Sean Copenhaver

"Water is fluid, soft and yielding. But water will wear away rock, which is rigid and cannot yield. As a rule, whatever is fluid, soft and yielding will overcome whatever is rigid and hard. This is another paradox: what is soft is strong." - Lao-Tzu


On Thursday, January 17, 2013 at 6:44 PM, Mark Hahn wrote:

> thx
> 
> 
> 
> On Thu, Jan 17, 2013 at 3:29 PM, Daniel Gonzalez <gonvaled@gonvaled.com (mailto:gonvaled@gonvaled.com)>wrote:
> 
> > The problem is not replication, the problem is the source of the data. The
> > replicators will just distribute the data that is being inserted to other
> > server instances.
> > 
> > You can not use that monotonical id generator if you are inserting data
> > from different servers or applications. But if you are, let's say,
> > importing data to a single couchdb (replication or not) from a third-party
> > database in one batch job, you have full control on the IDs, so you can use
> > that id generator. That will improve the performance of your database,
> > specially in relation to space used and view generation.
> > 
> > On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn <mark@hahnca.com (mailto:mark@hahnca.com)> wrote:
> > 
> > > > you can only do this if you are in control of the IDs
> > > 
> > > This wouldn't work with multiple servers replicating, would it?
> > > 
> > > 
> > > On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <gonvaled@gonvaled.com (mailto:gonvaled@gonvaled.com)
> > > > wrote:
> > > 
> > > 
> > > > And here you have BaseConverter:
> > > > 
> > > > """
> > > > Convert numbers from base 10 integers to base X strings and back again.
> > > > 
> > > > Sample usage:
> > > > 
> > > > > > > base20 = BaseConverter('0123456789abcdefghij')
> > > > > > > base20.from_decimal(1234)
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > > '31e'
> > > > > > > base20.to_decimal('31e')
> > > > > > 
> > > > > 
> > > > 
> > > > 1234
> > > > """
> > > > 
> > > > class BaseConverter(object):
> > > > decimal_digits = "0123456789"
> > > > 
> > > > def __init__(self, digits):
> > > > self.digits = digits
> > > > 
> > > > def from_decimal(self, i):
> > > > return self.convert(i, self.decimal_digits, self.digits)
> > > > 
> > > > def to_decimal(self, s):
> > > > return int(self.convert(s, self.digits, self.decimal_digits))
> > > > 
> > > > def convert(number, fromdigits, todigits):
> > > > # Based on http://code.activestate.com/recipes/111286/
> > > > if str(number)[0] == '-':
> > > > number = str(number)[1:]
> > > > neg = 1
> > > > else:
> > > > neg = 0
> > > > 
> > > > # make an integer out of the number
> > > > x = 0
> > > > for digit in str(number):
> > > > x = x * len(fromdigits) + fromdigits.index(digit)
> > > > 
> > > > # create the result in base 'len(todigits)'
> > > > if x == 0:
> > > > res = todigits[0]
> > > > else:
> > > > res = ""
> > > > while x > 0:
> > > > digit = x % len(todigits)
> > > > res = todigits[digit] + res
> > > > x = int(x / len(todigits))
> > > > if neg:
> > > > res = '-' + res
> > > > return res
> > > > convert = staticmethod(convert)
> > > > 
> > > > 
> > > > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <
> > gonvaled@gonvaled.com (mailto:gonvaled@gonvaled.com)
> > > > > wrote:
> > > > 
> > > > 
> > > > > Also, in order to improve view performance, it is better if you use a
> > > > > short and monotonically increasing id: this is what I am using for
> > > > > 
> > > > 
> > > > 
> > > 
> > 
> > one
> > > of
> > > > > my databases with millions of documents:
> > > > > 
> > > > > class MonotonicalID:
> > > > > 
> > > > > def __init__(self, cnt = 0):
> > > > > self.cnt = cnt
> > > > > self.base62 =
> > > > > 
> > > > 
> > > 
> > > 
> > 
> > BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
> > > > > # This alphabet is better for couchdb, since it represents
> > > > 
> > > 
> > 
> > the
> > > > > Unicode Collation Algorithm
> > > > > self.base64_couch =
> > > > > 
> > > > 
> > > 
> > 
> > BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
> > > > > 
> > > > > def get(self):
> > > > > res = self.base64_couch.from_decimal(self.cnt)
> > > > > self.cnt += 1
> > > > > return res
> > > > > 
> > > > > Doing this will:
> > > > > - save space in the database, since the id starts small: take into
> > > > > 
> > > > 
> > > > account
> > > > > that the id is used in lots of internal data structures in couchdb,
> > > > 
> > > > 
> > > 
> > 
> > so
> > > > > making it short will save lots of space in a big database
> > > > > - making it ordered (in the couchdb sense) will speed up certain
> > > > > 
> > > > 
> > > > operations
> > > > > 
> > > > > Drawback: you can only do this if you are in control of the IDs (you
> > > know
> > > > > that nobody else is going to be generating IDs)
> > > > > 
> > > > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <mark@hahnca.com (mailto:mark@hahnca.com)> wrote:
> > > > > 
> > > > > > Thanks for the tips. Keep them coming.
> > > > > > 
> > > > > > I'm going to try everything I can. If I find anything surprising
> > I'll
> > > > let
> > > > > > everyone know.
> > > > > > 
> > > > > > 
> > > > > > On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <
> > > gonvaled@gonvaled.com (mailto:gonvaled@gonvaled.com)
> > > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > > Are you doing single writes or batch writes?
> > > > > > > I managed to improve the write performance by collecting the
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > > documents
> > > > > > and
> > > > > > > sending them in a single access.
> > > > > > > The same applies for read accesses.
> > > > > > > 
> > > > > > > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <mark@hahnca.com (mailto:mark@hahnca.com)>
> > wrote:
> > > > > > > 
> > > > > > > > My couchdb is seeing a typical request rate of about 100/sec
> > when
> > > it
> > > > > > is
> > > > > > > > maxed out. This is typically 10 reads/write. This is
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > > disappointing.
> > > > > > I
> > > > > > > > was hoping to 3 to 5 ms per op, not 10 ms. What performance
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > numbers
> > > > > > are
> > > > > > > > others seeing?
> > > > > > > > 
> > > > > > > > I have 35 views with only 50 to 100 entries per view. My db is
> > > less
> > > > > > > than a
> > > > > > > > gigabyte with a few thousand active docs.
> > > > > > > > 
> > > > > > > > I'm running on a medium ec2 instance with ephemeral disk. I
> > > assume
> > > > I
> > > > > > am
> > > > > > > IO
> > > > > > > > bound as the cpu is not maxing out.
> > > > > > > > 
> > > > > > > > How much worse would this get if the db also had to handle
> > > > replication
> > > > > > > > between multiple servers?
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> 
> 
>

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

thx



On Thu, Jan 17, 2013 at 3:29 PM, Daniel Gonzalez <go...@gonvaled.com>wrote:

> The problem is not replication, the problem is the source of the data. The
> replicators will just distribute the data that is being inserted to other
> server instances.
>
> You can not use that monotonical id generator if you are inserting data
> from different servers or applications. But if you are, let's say,
> importing data to a single couchdb (replication or not) from a third-party
> database in one batch job, you have full control on the IDs, so you can use
> that id generator. That will improve the performance of your database,
> specially in relation to space used and view generation.
>
> On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn <ma...@hahnca.com> wrote:
>
> > > you can only do this if you are in control of the IDs
> >
> > This wouldn't work with multiple servers replicating, would it?
> >
> >
> > On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <gonvaled@gonvaled.com
> > >wrote:
> >
> > > And here you have BaseConverter:
> > >
> > > """
> > > Convert numbers from base 10 integers to base X strings and back again.
> > >
> > > Sample usage:
> > >
> > > >>> base20 = BaseConverter('0123456789abcdefghij')
> > > >>> base20.from_decimal(1234)
> > > '31e'
> > > >>> base20.to_decimal('31e')
> > > 1234
> > > """
> > >
> > > class BaseConverter(object):
> > >     decimal_digits = "0123456789"
> > >
> > >     def __init__(self, digits):
> > >         self.digits = digits
> > >
> > >     def from_decimal(self, i):
> > >         return self.convert(i, self.decimal_digits, self.digits)
> > >
> > >     def to_decimal(self, s):
> > >         return int(self.convert(s, self.digits, self.decimal_digits))
> > >
> > >     def convert(number, fromdigits, todigits):
> > >         # Based on http://code.activestate.com/recipes/111286/
> > >         if str(number)[0] == '-':
> > >             number = str(number)[1:]
> > >             neg = 1
> > >         else:
> > >             neg = 0
> > >
> > >         # make an integer out of the number
> > >         x = 0
> > >         for digit in str(number):
> > >            x = x * len(fromdigits) + fromdigits.index(digit)
> > >
> > >         # create the result in base 'len(todigits)'
> > >         if x == 0:
> > >             res = todigits[0]
> > >         else:
> > >             res = ""
> > >             while x > 0:
> > >                 digit = x % len(todigits)
> > >                 res = todigits[digit] + res
> > >                 x = int(x / len(todigits))
> > >             if neg:
> > >                 res = '-' + res
> > >         return res
> > >     convert = staticmethod(convert)
> > >
> > >
> > > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <
> gonvaled@gonvaled.com
> > > >wrote:
> > >
> > > > Also, in order to improve view performance, it is better if you use a
> > > > short and monotonically increasing id: this is what I am using for
> one
> > of
> > > > my databases with millions of documents:
> > > >
> > > > class MonotonicalID:
> > > >
> > > >     def __init__(self, cnt = 0):
> > > >         self.cnt = cnt
> > > >         self.base62 =
> > > >
> > >
> >
> BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
> > > >         # This alphabet is better for couchdb, since it represents
> the
> > > > Unicode Collation Algorithm
> > > >         self.base64_couch =
> > > >
> > >
> >
> BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
> > > >
> > > >     def get(self):
> > > >         res = self.base64_couch.from_decimal(self.cnt)
> > > >         self.cnt += 1
> > > >         return res
> > > >
> > > > Doing this will:
> > > > - save space in the database, since the id starts small: take into
> > > account
> > > > that the id is used in lots of internal data structures in couchdb,
> so
> > > > making it short will save lots of space in a big database
> > > > - making it ordered (in the couchdb sense) will speed up certain
> > > operations
> > > >
> > > > Drawback: you can only do this if you are in control of the IDs (you
> > know
> > > > that nobody else is going to be generating IDs)
> > > >
> > > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <ma...@hahnca.com> wrote:
> > > >
> > > >> Thanks for the tips.  Keep them coming.
> > > >>
> > > >> I'm going to try everything I can.  If I find anything surprising
> I'll
> > > let
> > > >> everyone know.
> > > >>
> > > >>
> > > >> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <
> > gonvaled@gonvaled.com
> > > >> >wrote:
> > > >>
> > > >> > Are you doing single writes or batch writes?
> > > >> > I managed to improve the write performance by collecting the
> > documents
> > > >> and
> > > >> > sending them in a single access.
> > > >> > The same applies for read accesses.
> > > >> >
> > > >> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com>
> wrote:
> > > >> >
> > > >> > > My couchdb is seeing a typical request rate of about 100/sec
> when
> > it
> > > >> is
> > > >> > > maxed out.  This is typically 10 reads/write.  This is
> > > disappointing.
> > > >>  I
> > > >> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance
> > numbers
> > > >> are
> > > >> > > others seeing?
> > > >> > >
> > > >> > > I have 35 views with only 50 to 100 entries per view.  My db is
> > less
> > > >> > than a
> > > >> > > gigabyte with a few thousand active docs.
> > > >> > >
> > > >> > > I'm running on a medium ec2 instance with ephemeral disk.  I
> > assume
> > > I
> > > >> am
> > > >> > IO
> > > >> > > bound as the cpu is not maxing out.
> > > >> > >
> > > >> > > How much worse would this get if the db also had to handle
> > > replication
> > > >> > > between multiple servers?
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: general question about couch performance

Posted by Daniel Gonzalez <go...@gonvaled.com>.

The problem is not replication, the problem is the source of the data. The
replicators will just distribute the data that is being inserted to other
server instances.

You can not use that monotonical id generator if you are inserting data
from different servers or applications. But if you are, let's say,
importing data to a single couchdb (replication or not) from a third-party
database in one batch job, you have full control on the IDs, so you can use
that id generator. That will improve the performance of your database,
specially in relation to space used and view generation.

On Fri, Jan 18, 2013 at 12:20 AM, Mark Hahn <ma...@hahnca.com> wrote:

> > you can only do this if you are in control of the IDs
>
> This wouldn't work with multiple servers replicating, would it?
>
>
> On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <gonvaled@gonvaled.com
> >wrote:
>
> > And here you have BaseConverter:
> >
> > """
> > Convert numbers from base 10 integers to base X strings and back again.
> >
> > Sample usage:
> >
> > >>> base20 = BaseConverter('0123456789abcdefghij')
> > >>> base20.from_decimal(1234)
> > '31e'
> > >>> base20.to_decimal('31e')
> > 1234
> > """
> >
> > class BaseConverter(object):
> >     decimal_digits = "0123456789"
> >
> >     def __init__(self, digits):
> >         self.digits = digits
> >
> >     def from_decimal(self, i):
> >         return self.convert(i, self.decimal_digits, self.digits)
> >
> >     def to_decimal(self, s):
> >         return int(self.convert(s, self.digits, self.decimal_digits))
> >
> >     def convert(number, fromdigits, todigits):
> >         # Based on http://code.activestate.com/recipes/111286/
> >         if str(number)[0] == '-':
> >             number = str(number)[1:]
> >             neg = 1
> >         else:
> >             neg = 0
> >
> >         # make an integer out of the number
> >         x = 0
> >         for digit in str(number):
> >            x = x * len(fromdigits) + fromdigits.index(digit)
> >
> >         # create the result in base 'len(todigits)'
> >         if x == 0:
> >             res = todigits[0]
> >         else:
> >             res = ""
> >             while x > 0:
> >                 digit = x % len(todigits)
> >                 res = todigits[digit] + res
> >                 x = int(x / len(todigits))
> >             if neg:
> >                 res = '-' + res
> >         return res
> >     convert = staticmethod(convert)
> >
> >
> > On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <gonvaled@gonvaled.com
> > >wrote:
> >
> > > Also, in order to improve view performance, it is better if you use a
> > > short and monotonically increasing id: this is what I am using for one
> of
> > > my databases with millions of documents:
> > >
> > > class MonotonicalID:
> > >
> > >     def __init__(self, cnt = 0):
> > >         self.cnt = cnt
> > >         self.base62 =
> > >
> >
> BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
> > >         # This alphabet is better for couchdb, since it represents the
> > > Unicode Collation Algorithm
> > >         self.base64_couch =
> > >
> >
> BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
> > >
> > >     def get(self):
> > >         res = self.base64_couch.from_decimal(self.cnt)
> > >         self.cnt += 1
> > >         return res
> > >
> > > Doing this will:
> > > - save space in the database, since the id starts small: take into
> > account
> > > that the id is used in lots of internal data structures in couchdb, so
> > > making it short will save lots of space in a big database
> > > - making it ordered (in the couchdb sense) will speed up certain
> > operations
> > >
> > > Drawback: you can only do this if you are in control of the IDs (you
> know
> > > that nobody else is going to be generating IDs)
> > >
> > > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <ma...@hahnca.com> wrote:
> > >
> > >> Thanks for the tips.  Keep them coming.
> > >>
> > >> I'm going to try everything I can.  If I find anything surprising I'll
> > let
> > >> everyone know.
> > >>
> > >>
> > >> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <
> gonvaled@gonvaled.com
> > >> >wrote:
> > >>
> > >> > Are you doing single writes or batch writes?
> > >> > I managed to improve the write performance by collecting the
> documents
> > >> and
> > >> > sending them in a single access.
> > >> > The same applies for read accesses.
> > >> >
> > >> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:
> > >> >
> > >> > > My couchdb is seeing a typical request rate of about 100/sec when
> it
> > >> is
> > >> > > maxed out.  This is typically 10 reads/write.  This is
> > disappointing.
> > >>  I
> > >> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance
> numbers
> > >> are
> > >> > > others seeing?
> > >> > >
> > >> > > I have 35 views with only 50 to 100 entries per view.  My db is
> less
> > >> > than a
> > >> > > gigabyte with a few thousand active docs.
> > >> > >
> > >> > > I'm running on a medium ec2 instance with ephemeral disk.  I
> assume
> > I
> > >> am
> > >> > IO
> > >> > > bound as the cpu is not maxing out.
> > >> > >
> > >> > > How much worse would this get if the db also had to handle
> > replication
> > >> > > between multiple servers?
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

> you can only do this if you are in control of the IDs

This wouldn't work with multiple servers replicating, would it?


On Thu, Jan 17, 2013 at 3:15 PM, Daniel Gonzalez <go...@gonvaled.com>wrote:

> And here you have BaseConverter:
>
> """
> Convert numbers from base 10 integers to base X strings and back again.
>
> Sample usage:
>
> >>> base20 = BaseConverter('0123456789abcdefghij')
> >>> base20.from_decimal(1234)
> '31e'
> >>> base20.to_decimal('31e')
> 1234
> """
>
> class BaseConverter(object):
>     decimal_digits = "0123456789"
>
>     def __init__(self, digits):
>         self.digits = digits
>
>     def from_decimal(self, i):
>         return self.convert(i, self.decimal_digits, self.digits)
>
>     def to_decimal(self, s):
>         return int(self.convert(s, self.digits, self.decimal_digits))
>
>     def convert(number, fromdigits, todigits):
>         # Based on http://code.activestate.com/recipes/111286/
>         if str(number)[0] == '-':
>             number = str(number)[1:]
>             neg = 1
>         else:
>             neg = 0
>
>         # make an integer out of the number
>         x = 0
>         for digit in str(number):
>            x = x * len(fromdigits) + fromdigits.index(digit)
>
>         # create the result in base 'len(todigits)'
>         if x == 0:
>             res = todigits[0]
>         else:
>             res = ""
>             while x > 0:
>                 digit = x % len(todigits)
>                 res = todigits[digit] + res
>                 x = int(x / len(todigits))
>             if neg:
>                 res = '-' + res
>         return res
>     convert = staticmethod(convert)
>
>
> On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <gonvaled@gonvaled.com
> >wrote:
>
> > Also, in order to improve view performance, it is better if you use a
> > short and monotonically increasing id: this is what I am using for one of
> > my databases with millions of documents:
> >
> > class MonotonicalID:
> >
> >     def __init__(self, cnt = 0):
> >         self.cnt = cnt
> >         self.base62 =
> >
> BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
> >         # This alphabet is better for couchdb, since it represents the
> > Unicode Collation Algorithm
> >         self.base64_couch =
> >
> BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
> >
> >     def get(self):
> >         res = self.base64_couch.from_decimal(self.cnt)
> >         self.cnt += 1
> >         return res
> >
> > Doing this will:
> > - save space in the database, since the id starts small: take into
> account
> > that the id is used in lots of internal data structures in couchdb, so
> > making it short will save lots of space in a big database
> > - making it ordered (in the couchdb sense) will speed up certain
> operations
> >
> > Drawback: you can only do this if you are in control of the IDs (you know
> > that nobody else is going to be generating IDs)
> >
> > On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >
> >> Thanks for the tips.  Keep them coming.
> >>
> >> I'm going to try everything I can.  If I find anything surprising I'll
> let
> >> everyone know.
> >>
> >>
> >> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <gonvaled@gonvaled.com
> >> >wrote:
> >>
> >> > Are you doing single writes or batch writes?
> >> > I managed to improve the write performance by collecting the documents
> >> and
> >> > sending them in a single access.
> >> > The same applies for read accesses.
> >> >
> >> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >> >
> >> > > My couchdb is seeing a typical request rate of about 100/sec when it
> >> is
> >> > > maxed out.  This is typically 10 reads/write.  This is
> disappointing.
> >>  I
> >> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers
> >> are
> >> > > others seeing?
> >> > >
> >> > > I have 35 views with only 50 to 100 entries per view.  My db is less
> >> > than a
> >> > > gigabyte with a few thousand active docs.
> >> > >
> >> > > I'm running on a medium ec2 instance with ephemeral disk.  I assume
> I
> >> am
> >> > IO
> >> > > bound as the cpu is not maxing out.
> >> > >
> >> > > How much worse would this get if the db also had to handle
> replication
> >> > > between multiple servers?
> >> > >
> >> >
> >>
> >
> >
>

Re: general question about couch performance

Posted by Daniel Gonzalez <go...@gonvaled.com>.

And here you have BaseConverter:

"""
Convert numbers from base 10 integers to base X strings and back again.

Sample usage:

>>> base20 = BaseConverter('0123456789abcdefghij')
>>> base20.from_decimal(1234)
'31e'
>>> base20.to_decimal('31e')
1234
"""

class BaseConverter(object):
    decimal_digits = "0123456789"

    def __init__(self, digits):
        self.digits = digits

    def from_decimal(self, i):
        return self.convert(i, self.decimal_digits, self.digits)

    def to_decimal(self, s):
        return int(self.convert(s, self.digits, self.decimal_digits))

    def convert(number, fromdigits, todigits):
        # Based on http://code.activestate.com/recipes/111286/
        if str(number)[0] == '-':
            number = str(number)[1:]
            neg = 1
        else:
            neg = 0

        # make an integer out of the number
        x = 0
        for digit in str(number):
           x = x * len(fromdigits) + fromdigits.index(digit)

        # create the result in base 'len(todigits)'
        if x == 0:
            res = todigits[0]
        else:
            res = ""
            while x > 0:
                digit = x % len(todigits)
                res = todigits[digit] + res
                x = int(x / len(todigits))
            if neg:
                res = '-' + res
        return res
    convert = staticmethod(convert)


On Fri, Jan 18, 2013 at 12:13 AM, Daniel Gonzalez <go...@gonvaled.com>wrote:

> Also, in order to improve view performance, it is better if you use a
> short and monotonically increasing id: this is what I am using for one of
> my databases with millions of documents:
>
> class MonotonicalID:
>
>     def __init__(self, cnt = 0):
>         self.cnt = cnt
>         self.base62 =
> BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
>         # This alphabet is better for couchdb, since it represents the
> Unicode Collation Algorithm
>         self.base64_couch =
> BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')
>
>     def get(self):
>         res = self.base64_couch.from_decimal(self.cnt)
>         self.cnt += 1
>         return res
>
> Doing this will:
> - save space in the database, since the id starts small: take into account
> that the id is used in lots of internal data structures in couchdb, so
> making it short will save lots of space in a big database
> - making it ordered (in the couchdb sense) will speed up certain operations
>
> Drawback: you can only do this if you are in control of the IDs (you know
> that nobody else is going to be generating IDs)
>
> On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <ma...@hahnca.com> wrote:
>
>> Thanks for the tips.  Keep them coming.
>>
>> I'm going to try everything I can.  If I find anything surprising I'll let
>> everyone know.
>>
>>
>> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <gonvaled@gonvaled.com
>> >wrote:
>>
>> > Are you doing single writes or batch writes?
>> > I managed to improve the write performance by collecting the documents
>> and
>> > sending them in a single access.
>> > The same applies for read accesses.
>> >
>> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:
>> >
>> > > My couchdb is seeing a typical request rate of about 100/sec when it
>> is
>> > > maxed out.  This is typically 10 reads/write.  This is disappointing.
>>  I
>> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers
>> are
>> > > others seeing?
>> > >
>> > > I have 35 views with only 50 to 100 entries per view.  My db is less
>> > than a
>> > > gigabyte with a few thousand active docs.
>> > >
>> > > I'm running on a medium ec2 instance with ephemeral disk.  I assume I
>> am
>> > IO
>> > > bound as the cpu is not maxing out.
>> > >
>> > > How much worse would this get if the db also had to handle replication
>> > > between multiple servers?
>> > >
>> >
>>
>
>

Re: general question about couch performance

Posted by Daniel Gonzalez <go...@gonvaled.com>.

Also, in order to improve view performance, it is better if you use a short
and monotonically increasing id: this is what I am using for one of my
databases with millions of documents:

class MonotonicalID:

    def __init__(self, cnt = 0):
        self.cnt = cnt
        self.base62 =
BaseConverter('ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz')
        # This alphabet is better for couchdb, since it represents the
Unicode Collation Algorithm
        self.base64_couch =
BaseConverter('-@0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ')

    def get(self):
        res = self.base64_couch.from_decimal(self.cnt)
        self.cnt += 1
        return res

Doing this will:
- save space in the database, since the id starts small: take into account
that the id is used in lots of internal data structures in couchdb, so
making it short will save lots of space in a big database
- making it ordered (in the couchdb sense) will speed up certain operations

Drawback: you can only do this if you are in control of the IDs (you know
that nobody else is going to be generating IDs)

On Thu, Jan 17, 2013 at 8:00 PM, Mark Hahn <ma...@hahnca.com> wrote:

> Thanks for the tips.  Keep them coming.
>
> I'm going to try everything I can.  If I find anything surprising I'll let
> everyone know.
>
>
> On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <gonvaled@gonvaled.com
> >wrote:
>
> > Are you doing single writes or batch writes?
> > I managed to improve the write performance by collecting the documents
> and
> > sending them in a single access.
> > The same applies for read accesses.
> >
> > On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:
> >
> > > My couchdb is seeing a typical request rate of about 100/sec when it is
> > > maxed out.  This is typically 10 reads/write.  This is disappointing.
>  I
> > > was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers
> are
> > > others seeing?
> > >
> > > I have 35 views with only 50 to 100 entries per view.  My db is less
> > than a
> > > gigabyte with a few thousand active docs.
> > >
> > > I'm running on a medium ec2 instance with ephemeral disk.  I assume I
> am
> > IO
> > > bound as the cpu is not maxing out.
> > >
> > > How much worse would this get if the db also had to handle replication
> > > between multiple servers?
> > >
> >
>

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

Thanks for the tips.  Keep them coming.

I'm going to try everything I can.  If I find anything surprising I'll let
everyone know.


On Thu, Jan 17, 2013 at 4:54 AM, Daniel Gonzalez <go...@gonvaled.com>wrote:

> Are you doing single writes or batch writes?
> I managed to improve the write performance by collecting the documents and
> sending them in a single access.
> The same applies for read accesses.
>
> On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:
>
> > My couchdb is seeing a typical request rate of about 100/sec when it is
> > maxed out.  This is typically 10 reads/write.  This is disappointing.  I
> > was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers are
> > others seeing?
> >
> > I have 35 views with only 50 to 100 entries per view.  My db is less
> than a
> > gigabyte with a few thousand active docs.
> >
> > I'm running on a medium ec2 instance with ephemeral disk.  I assume I am
> IO
> > bound as the cpu is not maxing out.
> >
> > How much worse would this get if the db also had to handle replication
> > between multiple servers?
> >
>

Re: general question about couch performance

Posted by Daniel Gonzalez <go...@gonvaled.com>.

Are you doing single writes or batch writes?
I managed to improve the write performance by collecting the documents and
sending them in a single access.
The same applies for read accesses.

On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:

> My couchdb is seeing a typical request rate of about 100/sec when it is
> maxed out.  This is typically 10 reads/write.  This is disappointing.  I
> was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers are
> others seeing?
>
> I have 35 views with only 50 to 100 entries per view.  My db is less than a
> gigabyte with a few thousand active docs.
>
> I'm running on a medium ec2 instance with ephemeral disk.  I assume I am IO
> bound as the cpu is not maxing out.
>
> How much worse would this get if the db also had to handle replication
> between multiple servers?
>

Re: general question about couch performance

Posted by Alexander Shorin <kx...@gmail.com>.

On Fri, Jan 18, 2013 at 11:08 PM, Robert Newson <rn...@apache.org> wrote:
> The main bottleneck is the view server protocol itself (its serial,
> synchronous nature), at least for views. The biggest improvement would
> be batching or at least async.

I'd noted, that this have changed in 1.3: view server may receive
multiple map_doc commands before he return some result. This mean,
that view server may be process documents in async way for some cases,
when document mapping is a long and heavy process. However, view
server should return response in the same order as documents he
received, but this may be changed if view server will return result
with document id on board, so CouchDB will know for which document
this kvs are belongs to.

--
,,,^..^,,,

Re: general question about couch performance

Posted by Robert Newson <rn...@apache.org>.

The main bottleneck is the view server protocol itself (its serial,
synchronous nature), at least for views. The biggest improvement would
be batching or at least async.

Anyone can attempt a port variant. With compelling numbers, it sounds
like an easy decision. :)

On 18 January 2013 13:46, Sean Copenhaver <se...@gmail.com> wrote:
> Some what related, are there talks about using erlang_js to embed
> SpiderMonkey as a port driver?
>
>
> On Fri, Jan 18, 2013 at 1:34 PM, Mark Hahn <ma...@hahnca.com> wrote:
>
>> is :P :D like a  pbthhh...
>>
>
>
>
> --
> “The limits of language are the limits of one's world. “ - Ludwig von
> Wittgenstein
>
> "Water is fluid, soft and yielding. But water will wear away rock, which is
> rigid and cannot yield. As a rule, whatever is fluid, soft and yielding
> will overcome whatever is rigid and hard. This is another paradox: what is
> soft is strong." - Lao-Tzu

Re: general question about couch performance

Posted by Sean Copenhaver <se...@gmail.com>.

Some what related, are there talks about using erlang_js to embed
SpiderMonkey as a port driver?

On Fri, Jan 18, 2013 at 1:34 PM, Mark Hahn <ma...@hahnca.com> wrote:

> is :P :D like a  pbthhh...
>

-- 
“The limits of language are the limits of one's world. “ - Ludwig von
Wittgenstein

"Water is fluid, soft and yielding. But water will wear away rock, which is
rigid and cannot yield. As a rule, whatever is fluid, soft and yielding
will overcome whatever is rigid and hard. This is another paradox: what is
soft is strong." - Lao-Tzu

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

is :P :D like a  pbthhh...

Re: general question about couch performance

Posted by Robert Newson <rn...@apache.org>.

Also, validate_doc_update is an awesome feature, so there :P :D

On 18 January 2013 13:25, Mark Hahn <ma...@hahnca.com> wrote:
>> Do you have validate_doc_update in some design document
>
> No, but I use an update handler for every update.  I coded the handler so
> that I can send the DB an update to an internal field, even a nested one,
> and it reads the doc, does the update, and saves it.  I really like this
> feature.  It also reduces conflicts because of the faster turnaround and
> less http traffic.  It is almost atomic.
>
> I just found out that my medium server only has 1.7 gigs of ram.  I just
> bumped up to a large with 7.5 gigs.  Even thought the large server has a
> little less cpu, the DB runs much better.  I assume the file system cache
> has some breathing room now.
>
> I feel guilty that I've caused all this because of a lame mistake in server
> choice.  However I have really learned a lot and appreciate all the
> feedback, which I'm going to implement, even though my app runs decently
> now.

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

> Do you have validate_doc_update in some design document

No, but I use an update handler for every update.  I coded the handler so
that I can send the DB an update to an internal field, even a nested one,
and it reads the doc, does the update, and saves it.  I really like this
feature.  It also reduces conflicts because of the faster turnaround and
less http traffic.  It is almost atomic.

I just found out that my medium server only has 1.7 gigs of ram.  I just
bumped up to a large with 7.5 gigs.  Even thought the large server has a
little less cpu, the DB runs much better.  I assume the file system cache
has some breathing room now.

I feel guilty that I've caused all this because of a lame mistake in server
choice.  However I have really learned a lot and appreciate all the
feedback, which I'm going to implement, even though my app runs decently
now.

Re: general question about couch performance

Posted by Marco Monteiro <ma...@textovirtual.com>.

Do you have validate_doc_update in some design document, or,
worse, in multiple design documents?

In my experience, that misfeature is the single biggest cause
of performance problems. Given the CPU usage, that might
be it, even with not too many writes.

On 18 January 2013 04:00, Mark Hahn <ma...@hahnca.com> wrote:

> On Thu, Jan 17, 2013 at 7:52 PM, Benoit Chesneau <bchesneau@gmail.com
> >wrote:
>
> > Which version of couchdb
>
>
> > Which version of couchdb
>
> 1.2.1
>
> > Authenticated requests
>
> Just localhost requests with the admin credentials in the url.
>
> BTW, I just found out that couch was maxing out my cpu, unlike what I said
> in my initial post.  I tried it on a large ec2 instance instead of medium
> and it is noticeably faster, but I haven't measured it yet.  I just assumed
> it was IO bound.
>

Re: general question about couch performance

Posted by Mark Hahn <ma...@hahnca.com>.

On Thu, Jan 17, 2013 at 7:52 PM, Benoit Chesneau <bc...@gmail.com>wrote:

> Which version of couchdb

> Which version of couchdb

1.2.1

> Authenticated requests

Just localhost requests with the admin credentials in the url.

BTW, I just found out that couch was maxing out my cpu, unlike what I said
in my initial post.  I tried it on a large ec2 instance instead of medium
and it is noticeably faster, but I haven't measured it yet.  I just assumed
it was IO bound.

Re: general question about couch performance

Posted by Benoit Chesneau <bc...@gmail.com>.

On Wed, Jan 16, 2013 at 9:17 PM, Mark Hahn <ma...@hahnca.com> wrote:

> My couchdb is seeing a typical request rate of about 100/sec when it is
> maxed out.  This is typically 10 reads/write.  This is disappointing.  I
> was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers are
> others seeing?
>
> I have 35 views with only 50 to 100 entries per view.  My db is less than a
> gigabyte with a few thousand active docs.
>
> I'm running on a medium ec2 instance with ephemeral disk.  I assume I am IO
> bound as the cpu is not maxing out.
>
> How much worse would this get if the db also had to handle replication
> between multiple servers?
>

Which version of couchdb? Authenticated requests?

- benoit

Re: general question about couch performance

Posted by Alexander Shorin <kx...@gmail.com>.

Hi Mark!

Have you tried to tweak httpd options? In local.ini there is their definition:

[httpd]
; Options for the MochiWeb HTTP server.
;server_options = [{backlog, 128}, {acceptor_pool_size, 16}]
; For more socket options, consult Erlang's module 'inet' man page.
;socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}]

For example, disabling Nagle algorithm through {nodelay,true} socket
option may dramatically change rps value[1].

That is about network operations. For disk one there is
delayed_commits option, but it requires to suffer data durability for
some speed and actually it's not recommended to keep them him with
true value.

[1]: http://code.google.com/p/couchdb-python/issues/detail?id=193#c22
--
,,,^..^,,,

On Thu, Jan 17, 2013 at 12:17 AM, Mark Hahn <ma...@hahnca.com> wrote:
> My couchdb is seeing a typical request rate of about 100/sec when it is
> maxed out.  This is typically 10 reads/write.  This is disappointing.  I
> was hoping to 3 to 5 ms per op, not 10 ms.  What performance numbers are
> others seeing?
>
> I have 35 views with only 50 to 100 entries per view.  My db is less than a
> gigabyte with a few thousand active docs.
>
> I'm running on a medium ec2 instance with ephemeral disk.  I assume I am IO
> bound as the cpu is not maxing out.
>
> How much worse would this get if the db also had to handle replication
> between multiple servers?