You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Sean Clark Hess <se...@gmail.com> on 2009/12/30 23:08:57 UTC

Two Concerns

So, I really like couch, but there are two complaints I have read that I
would like to resolve.

First, in the "latency" section of this blog post -
http://blog.woobling.org/2009/05/why-i-dont-use-couchdb.html - the author
says that Couch has performance issues because of the json/http layer. Is he
doing something wrong? Is it only applicable if you make multiple
connections from middleware at the same time? I was thinking that EVERY
database technology needs to open a socket connection (like MySQL from PHP)
so what's the difference?

Second, the replication system seems to be hotly contested. I don't really
understand how letting my data be inconsistent solves more problems than it
creates. I would think that data inconsistency would only be acceptable for
very specific apps.

If I DO need consistency, will it be easy to replicate/scale horizontally?
 Or will it require as much or more work as a "normal" master-slave
environment?

Thanks for pushing the envelope. I love the idea, I'm just still figuring
out whether couch is suitable for generic projects (or in other words, for
projects that don't fit the use cases exactly).

~sean

Re: Two Concerns

Posted by Sam Bisbee <sb...@computervip.com>.

On Wed, Dec 30, 2009 at 04:48:21PM -0800, Roger Binns wrote:
> CouchDB has no notion of masters or slaves.  

Right, but that can easily read as "you can't do master-slave relationships
with CouchDB". You probably know that that isn't true because you said...

> Anyone can replicate with anyone else at any time.  The underlying structures
> are specifically designed for this.

This allows you to easily set up master-slave and master-master relationships.

Cheers,

-- 
Sam Bisbee

Re: Two Concerns

Posted by Roger Binns <ro...@rogerbinns.com>.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sean Clark Hess wrote:
> the author
> says that Couch has performance issues because of the json/http layer. Is he
> doing something wrong?

It looks like the wrapper for his programming language did not support the
bulk apis and possibly not streaming data either.  Yes there will latency
problems going over a network reading/writing items one at a time.  This
applies to anything network related (eg SQL servers, web, file serving).
CouchDB provides bulk reading and writing, and streams data down as it is
found so those problems will only be as the result of a less than functional
access library.

> Second, the replication system seems to be hotly contested. I don't really
> understand how letting my data be inconsistent solves more problems than it
> creates. I would think that data inconsistency would only be acceptable for
> very specific apps.

For data to be consistent across any data storage system when you have a
group of servers providing the data then there are only two available
approaches.  One is to severely constrain the data (eg it can't reference
other data, ids are generated in some way that will never clash, only new
items can be created - existing ones can't be changed or deleted).  That
isn't exactly useful.

The second is that there has to be some sort of locking or serialization
system across all the servers.  For example you can designate one a master,
require all writes to that and have it replicate.  Or you can have some sort
of distributed lock manager.  This significantly affects performance, and
requires rather elaborate design and monitoring.

> If I DO need consistency, will it be easy to replicate/scale horizontally?

You need to be careful in exactly what you mean by consistency.  If for
example you mean that everyone always sees exactly the same view of data and
updates are transactional then you cannot use more than one CouchDB
server(*).  A multi-server solution is hard and expensive.  Oracle will be
happy to sell you one.

>  Or will it require as much or more work as a "normal" master-slave
> environment?

CouchDB has no notion of masters or slaves.  Anyone can replicate with
anyone else at any time.  The underlying structures are specifically
designed for this.

In normal use cases there are no conflicts when dealing with data 99.99% of
the time.  CouchDB optimises for this use case.  You add/modify/delete data
against the most convenient CouchDB instance as you see fit.  Then you
replicate as needed.  In a very small number of cases there will be
conflicts.  CouchDB lets you find those conflicts and address them as
needed.  (No information is lost or overwritten.)  Until you address the
conflicts it uses a heuristic of which version of the document to offer.

You should also be careful in data design for replication.  For example if
you store a blog posting and its comments as a single document then you are
likely to get conflicts when comments are added/changed/deleted from
different instances.  The solution would be to have the post as one document
and each comment as separate documents.  That would be very replication
friendly and you'd only get conflicts if the same post or comment is changed
on different instances concurrently (a rare event and easy to reconcile).

(*) The Lounge project lets you have the appearance of a single server while
talking to multiple backends.  In theory it could have all sorts of hooks to
have redundant backends, monitoring, replication triggers etc.  In other
words re-invent all the distributed locking and similar stuff you'd get in a
clustered database.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAks79NQACgkQmOOfHg372QS0tACfdJQXm+/SdUr7Uuq31zrDve7B
xe8An1k+q92PphBqawEk8LlGXmeDxW+4
=jrv+
-----END PGP SIGNATURE-----