You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Noah Slater <ns...@tumbolia.org> on 2012/05/10 19:53:58 UTC

Post-mortem

Guys,

What can we learn from this:

http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/


Thanks,

N

Re: Post-mortem

Posted by Klaus Trainer <kl...@posteo.de>.

Didn't know about that. Sounds good!


On Sat, 2012-05-12 at 15:34 +0200, Benoit Chesneau wrote:
> On Sat, May 12, 2012 at 3:26 PM, Klaus Trainer <kl...@posteo.de> wrote:
> > I wonder if it would be worth to add a configuration where all of a
> > database's view groups are updated on a document update. Has this ever
> > been discussed?
> >
> 
> This implemented in the view changes branch on rcouch. I need to find
> time to writes tests so I can propose it here:
> 
> https://github.com/refuge/couch_core/tree/view_changes
> 
> and specifically:
> 
> https://github.com/refuge/couch_core/blob/view_changes/apps/couch_changes/src/couch_changes.erl#L126
> 
> - benoît
> 
> - benoit

Re: Post-mortem

Posted by Benoit Chesneau <bc...@gmail.com>.

On Sat, May 12, 2012 at 3:26 PM, Klaus Trainer <kl...@posteo.de> wrote:
> I wonder if it would be worth to add a configuration where all of a
> database's view groups are updated on a document update. Has this ever
> been discussed?
>

This implemented in the view changes branch on rcouch. I need to find
time to writes tests so I can propose it here:

https://github.com/refuge/couch_core/tree/view_changes

and specifically:

https://github.com/refuge/couch_core/blob/view_changes/apps/couch_changes/src/couch_changes.erl#L126

- benoît

- benoit

Re: Post-mortem

Posted by Klaus Trainer <kl...@posteo.de>.

I wonder if it would be worth to add a configuration where all of a
database's view groups are updated on a document update. Has this ever
been discussed?

Such a configuration might have significant impact on write performance,
but on the other hand it would not only increase the performance on view
queries, but (what's more important) make their performance somehow
predictable. Writing scripts to query views in order to force index
updates sucks and is something nobody really wants to do.

To summarize:

There are three points the Sauce guys describe under the headline
"Maintenance headaches". I think that all three of them are valid
points. In contrast to the first and the third points, the second one
hasn't been worked on at all so far: "View indexes are only updated when
queried".

K

On Thu, 2012-05-10 at 18:53 +0100, Noah Slater wrote:
> Guys,
> 
> What can we learn from this:
> 
> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
> 
> 
> Thanks,
> 
> N

Re: Post-mortem

Posted by Tim McNamara <pa...@timmcnamara.co.nz>.

One other good thing that might be worth mentioning is that it sounds like
the migration path away from couch was pretty easy.

On 11 May 2012 12:28, Jason Smith <jh...@iriscouch.com> wrote:
> Thanks for the tip, Noah.
>
> In addition to solving its technical and community deficiencies, we
> all be banging on non-stop about why CouchDB is good for you:
>
> * Zero data loss (I can't believe it merits saying, but hey, there we are)
> * They built a successful product, because CouchDB cuts with the grain
> of the web
> * They built a successful business, similarly
> * Simple, transparent, confidence-instilling backups
> * It was a lot of fun
>
> ## The Story of CouchDB
>
> People don't acknowledge enough how great SQL is for a maturing
> company. Setting "product" aside, relational databases help build
> businesses. CouchDB is not a "NoSQL" database, it is a
> "domain-specific database." NoSQL could mean anything. When I have sex
> there is no SQL. It's NoSQL! (Sometimes, there is no transaction
> either.)
>
> The point is, bosses tend to ask relational questions. How many new
> signups from last month are on the west coast and referred a friend?
> How many leads converted to sales? Where is our product demanded most?
> On a domain-specific DBs, the answer is a programming project. On
> MySQL, it's a query. (I'm simplifying, but you all get it.)
>
> Couch should not claim to solve or even address those problems. The
> story should be, "get your product done fast, and right. Ship it
> today, and be poised to solve tomorrow's problems tomorrow."
>
> (Note, I ignore Couch's problems in this message, not to downplay
> them, but rather to keep focus.)
>
> On Fri, May 11, 2012 at 12:53 AM, Noah Slater <ns...@tumbolia.org>
wrote:
>> Guys,
>>
>> What can we learn from this:
>>
>> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>>
>>
>> Thanks,
>>
>> N
>
>
>
> --
> Iris Couch

Re: Post-mortem

Posted by Jason Smith <jh...@iriscouch.com>.

Thanks for the tip, Noah.

In addition to solving its technical and community deficiencies, we
all be banging on non-stop about why CouchDB is good for you:

* Zero data loss (I can't believe it merits saying, but hey, there we are)
* They built a successful product, because CouchDB cuts with the grain
of the web
* They built a successful business, similarly
* Simple, transparent, confidence-instilling backups
* It was a lot of fun

## The Story of CouchDB

People don't acknowledge enough how great SQL is for a maturing
company. Setting "product" aside, relational databases help build
businesses. CouchDB is not a "NoSQL" database, it is a
"domain-specific database." NoSQL could mean anything. When I have sex
there is no SQL. It's NoSQL! (Sometimes, there is no transaction
either.)

The point is, bosses tend to ask relational questions. How many new
signups from last month are on the west coast and referred a friend?
How many leads converted to sales? Where is our product demanded most?
On a domain-specific DBs, the answer is a programming project. On
MySQL, it's a query. (I'm simplifying, but you all get it.)

Couch should not claim to solve or even address those problems. The
story should be, "get your product done fast, and right. Ship it
today, and be poised to solve tomorrow's problems tomorrow."

(Note, I ignore Couch's problems in this message, not to downplay
them, but rather to keep focus.)

On Fri, May 11, 2012 at 12:53 AM, Noah Slater <ns...@tumbolia.org> wrote:
> Guys,
>
> What can we learn from this:
>
> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>
>
> Thanks,
>
> N

-- 
Iris Couch

Re: Post-mortem

Posted by Wendall Cada <we...@83864.com>.

Writing good CouchDB queries is hard. I had all of those same issues for 
a long time. Last year I did a refactor with a much better view 
structure and all the issues I was having building view indexes went 
away. Now with 1.2.0, my queries are not only better, but view building 
is significantly faster.

Another thing I see here is if your indexing is failing because of a 
"bad" view, you didn't test well enough before pushing to production. 
One major area of improvement here is better debugging for bad views. I 
build tests that run in Firefox against test data, as well as run full 
view indexing against production copies on test hardware. It would be 
incredibly useful to have not only better tools, but something like a 
CouchDB REPL.

BigCouch merge +1

Wendall

On 05/10/2012 10:53 AM, Noah Slater wrote:
> Guys,
>
> What can we learn from this:
>
> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>
>
> Thanks,
>
> N
>

Re: Post-mortem

Posted by "Kevin R. Coombes" <ke...@gmail.com>.

This is obviously my off-the-cuff immediate reaction.  My first 
observation is that is tremendously useful to have this kind of detailed 
real-world feedback on what is not working.  Now we just need to decide 
what to do with it.

In the posted list of availability issues, even though there are five 
bullets, there are only really two issues.  (I'm not counting the bugs 
listed as the fifth bullet. They're fixed.  Other bugs will crop up.  As 
Couch matures, one expects this issue to decrease in importance.)  In 
summary:

Issue 1:
Compaction fails silently.  I've noticed this myself, and that is 
clearly something that has to be fixed.  Failures will happen 
sometimes.  They shouldn't be silent.  Especially when that kind of 
silent failure can eat a tremendous amount of disc space.

Issue 2:
Queries fail because of slow disk performance or while reindexing.  
Reindexing can fail, or can take an extraordinarily long time.  While 
one view is being reindexed, all views *from that design document* 
fail.  (The performance problem listed in the post, I think, comes down 
to the same thing. And so do most of the maintenance problems)

My experience with this issue is also similar, but I've added the phrase 
about design documents.  I have a couple of databases (one with 3.5M 
documents) with multiple views defined in the same design document.  I 
made the mistake of trying to develop an application on this database.  
It was really painful every time I decided that one view needed to be 
changed, and having to wait a couple of hours while all the views got 
rebuilt.  (For that purpose, I made a filtered version of the database 
with about 10K documents to use during view development.)  Although I 
haven't tested it, I'm planning to move to a structure that puts one 
view per design document to see if the other views remain usable while 
one of them rebuilds.  Since other databases remain usable, I expect 
that this will work.  It would be good to have advice somewhere on the 
Couch web site or wiki about how to organize views into documents, with 
more details about how that might affect performance.

I don't know if it is possible to restructure the code to serve up other 
views from the same design document while one is being rebuilt.  And 
while I know about "stale=ok" or "stale=update_after", both of those are 
hard to use from web sites that access the database. since they require 
modifying the URL.  And the "update_after" version only helps the first 
user, and just pushes the burden of waiting onto the next user.  If you 
have an active site with lots of users making queries, there is still 
going to be a performance hit.

Perhaps the solution is to make it possible to configure the server (on 
a per-database level? or globally?) to *always* return stale views while 
a view is rebuilding, and just mark them as stale.  Perhaps another 
reserved word, either returning something like
     _stale : true | false
or
     _currency: "stale" | "current"
so the user or script could decide whether to use that data or wait 
until the view is rebuilt (which begins suggesting other changes so you 
can query if the view is done rebuilding, but I won't follow that 
tangent any furether at the moment).

I think the bottom line is that some serious attention needs to be paid 
to real-world performance issues, primarily centered around rebuilding 
views and compaction.

     -- Kevin

On 5/10/2012 12:53 PM, Noah Slater wrote:
> Guys,
>
> What can we learn from this:
>
> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>
>
> Thanks,
>
> N
>

Re: Post-mortem

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Fri, May 11, 2012 at 7:56 AM, Benoit Chesneau <bc...@gmail.com> wrote:
> Anyway, Imo we need to improve some feature like M/R (having the
> indexation using multiple cores), replication ,and possibilities to
> monitor what happen in every part of couch.

+many. We have this wonderfully scalable algorithm to do the view
indexing, and then it's only used with 1 thread at a time! I've never
really understood why that is (even though I understand that the
algorithm being scalable doesn't help with serializing things on disk
and whatever else).

Also, a mode that will update all the indexes (asynchronously) on
insertion/updates would probably be something we'd use. It seems to me
Couch is better at read-heavy loads than at write-heavy loads, yet the
index-on-access seems optimized for the latter!

Cheers,

Dirkjan

Re: Post-mortem

Posted by ro...@gmail.com.

>Where I believe we've failed (or, if you prefer, not yet succeeded) is
>communicating this information to the public.

Absolutely (speaking as member of that public)

I'm starting to use Couch in production and can learn a huge amount from
threads like this ... is there a 'best practice' guide to setting up and
tuning your couchdb for production anywhere on the wiki? It would be great
if I could visit the wiki, type best practice into the search box and then
get a page of useful information like this that points me to where problems
can arise and how to mitigate them before they are an issue.

Perhaps if someone knowledgable started the wiki page, that would encourage
others to chip in and flesh it out.

Roger

Re: Post-mortem

Posted by Nathan Vander Wilt <na...@calftrail.com>.

On May 11, 2012, at 8:32 AM, Eli Stevens (Gmail) wrote:
> On Fri, May 11, 2012 at 7:57 AM, CGS <cg...@gmail.com> wrote:
>> What I don't understand is the followings:
>> 1. Those guys wanted a single front-end server which should keep up with
>> the incoming requests, correct? As far as I understood, CouchDB philosophy
>> is based on safety of the data, which was implemented as direct writes on
>> harddisk. So, having only one front-end server on which you force its hdd
>> to to keep up with the high speed internet connection is just like you want
>> to force a river to flow only through a mouse hole.
> 
> From my understanding of the post, the core issue wasn't a mismatch in
> scale between desired throughput (the river) and available throughput
> (the mousehole), it was that under high enough load CouchDB stopped
> honoring certain classes of requests entirely.  That's not a "too
> slow" problem, it's a "fell over and can't get up" problem.
> 
> I think it's very important that effort is made to reproduce and
> address these issues, since without being able to put more definite
> bounds on them, *everyone* is going to wonder if their load is high
> enough to trigger the problems.  Heck, I'm wondering it, and I don't
> typically have more than a couple hundred docs per DB (but a lot of
> DBs, and hundreds of megs of attachments per DB).


Here's one with steps to reproduce, albeit requiring a public-facing server capable of routing one special "I own this site" request to a separate :
https://issues.apache.org/jira/browse/COUCHDB-1429

That one found me within ten minutes of trying out http://blitz.io (themselves a CouchDB user, IIRC) but it's just sat since I filed it. Fortunately/unfortunately the site I could take down at about ~40 concurrent requests gets about that many hits in a _month_, so I don't need to switch to MySQL ;-)

hth,
-nvw

Re: Post-mortem

Posted by CGS <cg...@gmail.com>.

Well, my understanding for that specific point was that CouchDB was
crashing while writing the docs. That can be because of an error appeared
with high frequency or because the queue of documents to be written became
too long. The first is with very low chances if there is no attack
(remember, in that context, CouchDB crash was registered while inserting
docs in the database, not during compaction or building a view or anything
else), while the second can easily appear when the traffic is higher than
what the harddisk can digest. That made me think pretty much of the
conditions I spoke about.

If you want to reproduce the failure, just try to add single-doc requests
at a very high rate (standard behavior of a monitored gen_server with
listener in Erlang when the queued messages go beyond the predefined
length). But, that is something you cannot avoid if you work directly on
the hdd in whatever language you may develop the database (generally,
queued messages are RAM usage). This is the price CouchDB has to pay for
securing the data. So, if someone uses CouchDB, he/she should be aware of
that risk, unless either that someone doesn't read the CouchDB
documentation or that someone doesn't have any knowledge about hardware. In
either case, that someone shouldn't start speaking about such a so called
failure, don't you think? I would be embarrassed to report such a "problem"
unless I am sure it's coming from other sources then the ones I spoke
about. As far as I could notice, those guys didn't provide any detail about
what would cause that problem. Even worse, they admitted that by increasing
the speed of the harddisk, they noticed a slight improvement of the CouchDB
capabilities (of course, if you improve the hdd speed, you increase the
rate of document insertion which keeps the queue shorter unless you start
again to increase the speed of insertion). So, it's either me, or those
guys didn't understand CouchDB usage (considering they know something about
hardware). And that leads to the unfortunate conclusion that those guys
chose a product without thinking if that product fitted their project
design or not by conception. I do not say CouchDB cannot be used at high
rate document insertion, but one needs to pay attention how this can be
achieved. Also, the fact that they reported some bugs they discovered and
they got fast solutions from the devs proves that the devs are serious
about this product and for that they should deserve congratulations.

I do not want to offend those guys there because I know nothing about them,
but they should try to understand that coming with such "advertisement",
they need to be sure they have no blame there. I would write that kind of
article after I stopped being pissed off. But that is me and,
unfortunately, they acted too soon in my opinion.

CGS

On Fri, May 11, 2012 at 5:32 PM, Eli Stevens (Gmail)
<wi...@gmail.com>wrote:

> On Fri, May 11, 2012 at 7:57 AM, CGS <cg...@gmail.com> wrote:
> > What I don't understand is the followings:
> > 1. Those guys wanted a single front-end server which should keep up with
> > the incoming requests, correct? As far as I understood, CouchDB
> philosophy
> > is based on safety of the data, which was implemented as direct writes on
> > harddisk. So, having only one front-end server on which you force its hdd
> > to to keep up with the high speed internet connection is just like you
> want
> > to force a river to flow only through a mouse hole.
>
> From my understanding of the post, the core issue wasn't a mismatch in
> scale between desired throughput (the river) and available throughput
> (the mousehole), it was that under high enough load CouchDB stopped
> honoring certain classes of requests entirely.  That's not a "too
> slow" problem, it's a "fell over and can't get up" problem.
>
> I think it's very important that effort is made to reproduce and
> address these issues, since without being able to put more definite
> bounds on them, *everyone* is going to wonder if their load is high
> enough to trigger the problems.  Heck, I'm wondering it, and I don't
> typically have more than a couple hundred docs per DB (but a lot of
> DBs, and hundreds of megs of attachments per DB).
>
> Eli
>

Re: Post-mortem

Posted by "Eli Stevens (Gmail)" <wi...@gmail.com>.

On Fri, May 11, 2012 at 7:57 AM, CGS <cg...@gmail.com> wrote:
> What I don't understand is the followings:
> 1. Those guys wanted a single front-end server which should keep up with
> the incoming requests, correct? As far as I understood, CouchDB philosophy
> is based on safety of the data, which was implemented as direct writes on
> harddisk. So, having only one front-end server on which you force its hdd
> to to keep up with the high speed internet connection is just like you want
> to force a river to flow only through a mouse hole.

>From my understanding of the post, the core issue wasn't a mismatch in
scale between desired throughput (the river) and available throughput
(the mousehole), it was that under high enough load CouchDB stopped
honoring certain classes of requests entirely.  That's not a "too
slow" problem, it's a "fell over and can't get up" problem.

I think it's very important that effort is made to reproduce and
address these issues, since without being able to put more definite
bounds on them, *everyone* is going to wonder if their load is high
enough to trigger the problems.  Heck, I'm wondering it, and I don't
typically have more than a couple hundred docs per DB (but a lot of
DBs, and hundreds of megs of attachments per DB).

Eli

Re: Post-mortem

Posted by CGS <cg...@gmail.com>.

Hi,

I do not quite understand why so much fuzz on something which is clearly
not CouchDB fault. Maybe someone is able explain to me (maybe I am too slow
for such high level subject).

What I don't understand is the followings:
1. Those guys wanted a single front-end server which should keep up with
the incoming requests, correct? As far as I understood, CouchDB philosophy
is based on safety of the data, which was implemented as direct writes on
harddisk. So, having only one front-end server on which you force its hdd
to to keep up with the high speed internet connection is just like you want
to force a river to flow only through a mouse hole. I have problems in
understanding how this can happen. Does anyone know?
2. How can you implement NoSQL on an SQL product? Is like curving an apple
in a shape of banana and sell it as such. Am I missing anything here?
3. If MySQL is the future, then how come many service providers with a lot
of users moved to NoSQL?

Yes, there may be some points where CouchDB doesn't excel, but before
choosing a product, you take a look at what it offers, you don't buy the
box without looking what's inside, I suppose. Otherwise, it seems that
person doesn't know what happend to Pandora.

So, in the end, what matters is how this product will go on, not what some
think of. A word for devs: keep up with the good work!

Cheers,
CGS

On Fri, May 11, 2012 at 3:37 PM, Noah Slater <ns...@tumbolia.org> wrote:

> More comments here:
>
> http://news.ycombinator.com/item?id=3954596
>
> Not sure how useful they are...
>
> (Not caught up with the thread yet, sorry!)
>
> On Fri, May 11, 2012 at 2:34 PM, Dirkjan Ochtman <di...@ochtman.nl>
> wrote:
>
> > On Fri, May 11, 2012 at 3:25 PM, Robert Newson <rn...@apache.org>
> wrote:
> > > We're veering off-topic here, but there are several remaining issues.
> > > First is that the view file is at some update_seq relative to the
> > > database file. Being at update_seq N for a view means it has all the
> > > changes up to and including N, but nothing after N, so while those
> > > updates could be processed in parallel, they'd have to be applied to
> > > the view process and view file in order.
> >
> > Yeah, you'd have to serialize and buffer when writing to disk again.
> >
> > > Secondly, and more
> > > importantly, is how to handle rows that, for whatever reason, cause an
> > > exception when evaluated in the function (e.g, the common case where
> > > there's an undefined property and no guard clause). If the order is
> > > not determined, then two database, with the same data and same view
> > > code, will have different view results for the same input.
> >
> > I'm actually not clear on what happens with erroring view functions.
> > Does it just stop processing any further document revisions?
> >
> > In any case, it seems okayish to do have the document updates be in a
> > defined ordering but parallellize execution of the view function, then
> > serialize back into document order when the results come back, doing a
> > little buffering if the results aren't in order.
> >
> > Cheers,
> >
> > Dirkjan
> >
>

Re: Post-mortem

Posted by Robert Newson <rn...@apache.org>.

Oooh, you nearly tricked me into looking at HN. Bad Noah!

On 11 May 2012 14:37, Noah Slater <ns...@tumbolia.org> wrote:
> More comments here:
>
> http://news.ycombinator.com/item?id=3954596
>
> Not sure how useful they are...
>
> (Not caught up with the thread yet, sorry!)
>
> On Fri, May 11, 2012 at 2:34 PM, Dirkjan Ochtman <di...@ochtman.nl> wrote:
>
>> On Fri, May 11, 2012 at 3:25 PM, Robert Newson <rn...@apache.org> wrote:
>> > We're veering off-topic here, but there are several remaining issues.
>> > First is that the view file is at some update_seq relative to the
>> > database file. Being at update_seq N for a view means it has all the
>> > changes up to and including N, but nothing after N, so while those
>> > updates could be processed in parallel, they'd have to be applied to
>> > the view process and view file in order.
>>
>> Yeah, you'd have to serialize and buffer when writing to disk again.
>>
>> > Secondly, and more
>> > importantly, is how to handle rows that, for whatever reason, cause an
>> > exception when evaluated in the function (e.g, the common case where
>> > there's an undefined property and no guard clause). If the order is
>> > not determined, then two database, with the same data and same view
>> > code, will have different view results for the same input.
>>
>> I'm actually not clear on what happens with erroring view functions.
>> Does it just stop processing any further document revisions?
>>
>> In any case, it seems okayish to do have the document updates be in a
>> defined ordering but parallellize execution of the view function, then
>> serialize back into document order when the results come back, doing a
>> little buffering if the results aren't in order.
>>
>> Cheers,
>>
>> Dirkjan
>>

Re: Post-mortem

Posted by Noah Slater <ns...@tumbolia.org>.

More comments here:

http://news.ycombinator.com/item?id=3954596

Not sure how useful they are...

(Not caught up with the thread yet, sorry!)

On Fri, May 11, 2012 at 2:34 PM, Dirkjan Ochtman <di...@ochtman.nl> wrote:

> On Fri, May 11, 2012 at 3:25 PM, Robert Newson <rn...@apache.org> wrote:
> > We're veering off-topic here, but there are several remaining issues.
> > First is that the view file is at some update_seq relative to the
> > database file. Being at update_seq N for a view means it has all the
> > changes up to and including N, but nothing after N, so while those
> > updates could be processed in parallel, they'd have to be applied to
> > the view process and view file in order.
>
> Yeah, you'd have to serialize and buffer when writing to disk again.
>
> > Secondly, and more
> > importantly, is how to handle rows that, for whatever reason, cause an
> > exception when evaluated in the function (e.g, the common case where
> > there's an undefined property and no guard clause). If the order is
> > not determined, then two database, with the same data and same view
> > code, will have different view results for the same input.
>
> I'm actually not clear on what happens with erroring view functions.
> Does it just stop processing any further document revisions?
>
> In any case, it seems okayish to do have the document updates be in a
> defined ordering but parallellize execution of the view function, then
> serialize back into document order when the results come back, doing a
> little buffering if the results aren't in order.
>
> Cheers,
>
> Dirkjan
>

Re: Post-mortem

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Fri, May 11, 2012 at 3:25 PM, Robert Newson <rn...@apache.org> wrote:
> We're veering off-topic here, but there are several remaining issues.
> First is that the view file is at some update_seq relative to the
> database file. Being at update_seq N for a view means it has all the
> changes up to and including N, but nothing after N, so while those
> updates could be processed in parallel, they'd have to be applied to
> the view process and view file in order.

Yeah, you'd have to serialize and buffer when writing to disk again.

> Secondly, and more
> importantly, is how to handle rows that, for whatever reason, cause an
> exception when evaluated in the function (e.g, the common case where
> there's an undefined property and no guard clause). If the order is
> not determined, then two database, with the same data and same view
> code, will have different view results for the same input.

I'm actually not clear on what happens with erroring view functions.
Does it just stop processing any further document revisions?

In any case, it seems okayish to do have the document updates be in a
defined ordering but parallellize execution of the view function, then
serialize back into document order when the results come back, doing a
little buffering if the results aren't in order.

Cheers,

Dirkjan

Re: Post-mortem

Posted by Robert Newson <rn...@apache.org>.

We're veering off-topic here, but there are several remaining issues.
First is that the view file is at some update_seq relative to the
database file. Being at update_seq N for a view means it has all the
changes up to and including N, but nothing after N, so while those
updates could be processed in parallel, they'd have to be applied to
the view process and view file in order. Secondly, and more
importantly, is how to handle rows that, for whatever reason, cause an
exception when evaluated in the function (e.g, the common case where
there's an undefined property and no guard clause). If the order is
not determined, then two database, with the same data and same view
code, will have different view results for the same input.

I'm not saying it's insoluble, only that it's not as simple as it
might appear at first (or second) glance.

B.

On 11 May 2012 14:04, Dirkjan Ochtman <di...@ochtman.nl> wrote:
> On Fri, May 11, 2012 at 2:44 PM, Robert Newson <rn...@apache.org> wrote:
>> Fundamentally, the issue is that updating a view is processing an
>> incoming, ordered list of changes, there's not much parallelism to be
>> had there.
>
> Why is that? I don't see how the list of changes is ordered. ISTM that
> updated documents may be passed to the view indexer in any order,
> which is why M/R works. If that's true and computing keys and values
> is CPU-bound, parallelizing running the view function on the updated
> documents shouldn't be that hard.
>
> Cheers,
>
> Dirkjan

Re: Post-mortem

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Fri, May 11, 2012 at 2:44 PM, Robert Newson <rn...@apache.org> wrote:
> Fundamentally, the issue is that updating a view is processing an
> incoming, ordered list of changes, there's not much parallelism to be
> had there.

Why is that? I don't see how the list of changes is ordered. ISTM that
updated documents may be passed to the view indexer in any order,
which is why M/R works. If that's true and computing keys and values
is CPU-bound, parallelizing running the view function on the updated
documents shouldn't be that hard.

Cheers,

Dirkjan

Re: Post-mortem

Posted by Robert Newson <rn...@apache.org>.

Fundamentally, the issue is that updating a view is processing an
incoming, ordered list of changes, there's not much parallelism to be
had there. The reason sharding works is that you then have two smaller
lists which can be processed in parallel (but each still processed
serially). And, yes, my thought was that BigCouch will allow multiple
shards of the database on a single node, allowing parallel view builds
within that node.

Some care must be taken if we pursue optimizing a single view build.
In the context of a production system, with more than one database and
more than one view, all the cores get plenty of exercise. If every
task could run on all cores then I think you might hit other issues.

B.

On 11 May 2012 13:35, Dirkjan Ochtman <di...@ochtman.nl> wrote:
> On Fri, May 11, 2012 at 2:29 PM, Robert Newson <rn...@apache.org> wrote:
>> Making a single view indexing process faster keeps coming up. For one
>> thing, it's not that easy, otherwise it would have been done by now.
>> For another, this problem vanishes when you shard (and the BigCouch
>> merge will bring this to CouchDB).
>
> What does sharding mean in this context? Running CouchDB/BigCouch on
> multiple servers, or just running multiple processes on a single box?
> If the latter, why can't we run multiple threads/Erlang processes
> within a single shard/OS process? If the former, that's kind of silly,
> in the sense that building indexes (at least for me) is CPU-bound but
> leaves many of the cores in my server idle.
>
> Cheers,
>
> Dirkjan

Re: Post-mortem

Posted by Dirkjan Ochtman <di...@ochtman.nl>.

On Fri, May 11, 2012 at 2:29 PM, Robert Newson <rn...@apache.org> wrote:
> Making a single view indexing process faster keeps coming up. For one
> thing, it's not that easy, otherwise it would have been done by now.
> For another, this problem vanishes when you shard (and the BigCouch
> merge will bring this to CouchDB).

What does sharding mean in this context? Running CouchDB/BigCouch on
multiple servers, or just running multiple processes on a single box?
If the latter, why can't we run multiple threads/Erlang processes
within a single shard/OS process? If the former, that's kind of silly,
in the sense that building indexes (at least for me) is CPU-bound but
leaves many of the cores in my server idle.

Cheers,

Dirkjan

Re: Post-mortem

Posted by Robert Newson <rn...@apache.org>.

I'll pick up a few points here: "I don't know if it is possible to
restructure the code to serve up other views from the same design
document while one is being rebuilt."

It's possible but it's not happening. It's deliberately the case that
all views in a design document are built together, they are written to
the same file (which is why they all get rebuilt if you change one,
and why they're all unavailable while that happens). There's a
performance boost to grouping them together and some deduplication of
identical rows emitted from different views too.

Where I believe we've failed (or, if you prefer, not yet succeeded) is
communicating this information to the public. Additionally, the system
we've built to allow views to be built in production without
interfering with existing queries (the main trick being that the view
file is named for the MD5 of the source code) is not widely known. I
see many people complaining about these issues that we have always had
a solution for.

Making a single view indexing process faster keeps coming up. For one
thing, it's not that easy, otherwise it would have been done by now.
For another, this problem vanishes when you shard (and the BigCouch
merge will bring this to CouchDB). For another, this problem vanishes
when you have many databases and many views. It feels a lot like the
delayed_commits argument. It's true by default because if you do a
very poor benchmark (serial insertions) you get bad numbers, because
we fsync so often. It's unrealistic, even demeaning, for a database
test. Likewise, building a single view, from scratch, for millions of
documents. Yes, that's slow. No, it's not very realistic. Yes, the
equivalent CREATE INDEX would take a while too.

Finally, if your couchdb problem can find a sane MySQL solution, I'd
argue that couchdb wasn't the right solution in the first place.
That's not our fault.

B.

On 11 May 2012 08:31, Jason Smith <jh...@apache.org> wrote:
> On Fri, May 11, 2012 at 12:56 PM, Benoit Chesneau <bc...@gmail.com> wrote:
>> On Thu, May 10, 2012 at 7:53 PM, Noah Slater <ns...@tumbolia.org> wrote:
>>> Guys,
>>>
>>> What can we learn from this:
>>>
>>> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>>>
>>>
>>> Thanks,
>>>
>>> N
>>
>> Note apart: I'm not even sure they did the right choice in choosing mysqL.
>
> Well they *do* have a history of smart, prudent decision-making. So I
> put faith in them. (Personally I am dying for a reason to get into
> Drizzle, but CouchDB solves my current problems.)
>
>> For the rest we should market couchdb on what it really does.
>
> Enthusiastic +1
>
>> Some
>> are dreaming about big data, big central cluster blahblah. But not
>> everyone need that. Most companies have usable data < 2GB.
>
> I wouldn't call it "blah blah" but rather that CouchDB needs a gentle
> on-ramp. (Is that an Americanism?) A slip road. Some obvious killer
> app to go from zero to sixty (if I may press the metaphor).

Re: Post-mortem

Posted by Jason Smith <jh...@apache.org>.

On Fri, May 11, 2012 at 12:56 PM, Benoit Chesneau <bc...@gmail.com> wrote:
> On Thu, May 10, 2012 at 7:53 PM, Noah Slater <ns...@tumbolia.org> wrote:
>> Guys,
>>
>> What can we learn from this:
>>
>> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>>
>>
>> Thanks,
>>
>> N
>
> Note apart: I'm not even sure they did the right choice in choosing mysqL.

Well they *do* have a history of smart, prudent decision-making. So I
put faith in them. (Personally I am dying for a reason to get into
Drizzle, but CouchDB solves my current problems.)

> For the rest we should market couchdb on what it really does.

Enthusiastic +1

> Some
> are dreaming about big data, big central cluster blahblah. But not
> everyone need that. Most companies have usable data < 2GB.

I wouldn't call it "blah blah" but rather that CouchDB needs a gentle
on-ramp. (Is that an Americanism?) A slip road. Some obvious killer
app to go from zero to sixty (if I may press the metaphor).

Re: Post-mortem

Posted by Benoit Chesneau <bc...@gmail.com>.

On Thu, May 10, 2012 at 7:53 PM, Noah Slater <ns...@tumbolia.org> wrote:
> Guys,
>
> What can we learn from this:
>
> http://saucelabs.com/blog/index.php/2012/05/goodbye-couchdb/
>
>
> Thanks,
>
> N

Note apart: I'm not even sure they did the right choice in choosing mysqL.

Anyway, Imo we need to improve some feature like M/R (having the
indexation using multiple cores), replication ,and possibilities to
monitor what happen in every part of couch.  Then come bigcouch which
could solved their case. But the way offered by bigcouch to scale over
multiple nodes should be imo an *option* among the scaling
possibilities of couchdb.

For the rest we should market couchdb on what it really does.  Some
are dreaming about big data, big central cluster blahblah. But not
everyone need that. Most companies have usable data < 2GB.

Couchdb  by itself today is really good for managing "mobile" datasets
or rather datasets that need to be moved and shared often by people
It's also good some kind of clusters that doesn't need at all a quorum
or whatever. This is the point that we should market:  Cluster Of
Unreliable Commodity Hardware.

- benoît