You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Chris Stockton <ch...@gmail.com> on 2011/09/12 20:41:14 UTC

CouchDB Crash report db_not_found when attempting to replicate databases

It seems that randomly I am getting errors about crashes as our
replicator runs, all this replicator does is make sure that all
databases on the master server replicate to our failover by checking
status.

Details:
  - I notice the below error in the logs, anywhere from 0 to 30 at a time.
  - It seems that a database might start replicating okay then stop.
  - These errors [1] are on the failover pulling from master
  - No errors are displayed on the master server
  - The databases inside the URL in the db_not_found portion of the
error, are always available from curl from the failover machine, which
makes the error strange, somehow it thinks it can't find the database
  - Master seems healthy at all times, all database are available, no
errors in log

[1] --
  [Mon, 12 Sep 2011 18:34:14 GMT] [error] [<0.22466.5305>]
{error_report,<0.30.0>,
                          {<0.22466.5305>,crash_report,
                           [[{initial_call,{couch_rep,init,['Argument__1']}},
                             {pid,<0.22466.5305>},
                             {registered_name,[]},
                             {error_info,
                              {exit,
                               {db_not_found,
                                <<"http://user:pass@server:5984/db_10944/">>},
                               [{gen_server,init_it,6},
                                {proc_lib,init_p_do_apply,3}]}},
                             {ancestors,
                              [couch_rep_sup,couch_primary_services,
                               couch_server_sup,<0.31.0>]},
                             {messages,[]},
                             {links,[<0.81.0>]},
                             {dictionary,[]},
                             {trap_exit,true},
                             {status,running},
                             {heap_size,2584},
                             {stack_size,24},
                             {reductions,794}],
                            []]}}

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Chris Stockton <ch...@gmail.com>.

Hello,

On Tue, Sep 13, 2011 at 12:30 PM, Max Ogden <ma...@maxogden.com> wrote:
> Hi Chris,
>
> after installing https://github.com/joyent/node and http://npmjs.org/ you
> can simply do
>
> npm install replicate
>
> and then
>
> replicate http://sourcecouch/db http://destinationcouch/db
>
> it will simply return a 'success' message when it completes. there isn't any
> progress monitoring output yet. there also isnt support
> for continuous replication
>
> or alternatively you could write custom node.js code for finer-grained
> behavior.
>
> cheers,
>

Thank you for this effort I really am glad to see someone placing time
trying to solve some of the replication problems. Our current system
interacts heavily with existing couch api so it looks like
implementing the changes to fall within your design would be a bit of
a architectural shift for us that would need some testing cycles and
such which may not be practical as we are really trying to aim for
stability. However reading a bit about node.js and seeing what you
have done does give make me curious if it would be a good place to
start with a cheaper server wide replication. For now I just may need
to wait till the couchdb developers finish replication improvements
which will hopefully solve some of our growth problems.

Kind Regards,

-Chris

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Max Ogden <ma...@maxogden.com>.

Hi Chris,

after installing https://github.com/joyent/node and http://npmjs.org/ you
can simply do

npm install replicate

and then

replicate http://sourcecouch/db http://destinationcouch/db

it will simply return a 'success' message when it completes. there isn't any
progress monitoring output yet. there also isnt support
for continuous replication

or alternatively you could write custom node.js code for finer-grained
behavior.

cheers,

max

On Tue, Sep 13, 2011 at 12:19 PM, Chris Stockton
<ch...@gmail.com>wrote:

> Hello,
>
> On Tue, Sep 13, 2011 at 11:44 AM, Max Ogden <ma...@maxogden.com> wrote:
> > Hi Chris,
> >
> > From what I understand the current state of the replicator (as of 1.1) is
> > that for certain types of collections of documents it can be somewhat
> > fragile. In the case of the node.js package repository, http://npmjs.org
> ,
> > there are many relatively large (~100MB) documents that would sometimes
> > throw errors or timeout during replication and crash the replicator, at
> > which point the replicator would restart and attempt to pick up where it
> > left off. I am not an expert in the internals of the replicator but
> > apparently the cumulative time required for the replicator to repeatedly
> > crash and then subsequently relocate itself in _changes feed in the case
> of
> > replicating the node package manager was making the built in couch
> > replicator unusable for the task.
> >
>
> First of all I thank you for your response, I appreciate your time. We
> have had a rocky road with replication as well, everything from system
> limits to single document/view/reduce errors causing processes to
> spawn wildly crippling machines. We have slowly worked through them by
> upping system limits and erlang VM limits.
>
> I feel like the absolute root cause of our problem is that we scale
> via many smaller databases instead of a single large one. We are at
> about 4200 databases right now and its painful to netstat -nap|grep
> beam|wc -l and see 4200 active tcp connections. I have brought up
> suggestions and comments in the past about server wide replication,
> with some simple filtering function so a small pool of tcp connections
> and processes could be used, greatly improving our scaling pattern of
> many, small databases. I would be able to allocate time to try to
> contribute some kinda patch to do this, but I simply do not know
> erlang and it is very far from the languages I know (c, java, php,
> anything close to these.. erlang is a entirely different world)
>
> I have thought about changing our replication processes to only do
> single pass non-continuous replication, currently they manage and
> reconcile dropped replication tasks by monitoring status, using the
> continuous =true flag, but I may need to drop that at the cost of
> possible data loss if we get a crash in between passes.
>
> > Two solutions exist that I know of. There is a new replicator in trunk
> (not
> > to be confused with the _replicator db from 1.1 -- it is still using the
> old
> > replicator algorithms) and there is also a more reliable replicator
> written
> > in node.js https://github.com/mikeal/replicate that was was written
> > specifically to replicate the node package repository between hosting
> > providers.
> >
>
> Is there any documentation on this? Although I have heard good things
> I am not familiar with node.js, I am interested in any alternatives
> that better fit our use cases. At the end of the day stability, data
> consistency and reliability for our customers for me is the biggest
> concern, right now we don't have that and it's what I'm aiming for, no
> more 2AM noc phone calls is the goal! :- )
>
> > Additionally it may be useful if you could describe the 'fingerprint' of
> > your documents a bit. How many documents are in the failing databases?
> are
> > the documents large or small? do they have many attachments? how large is
> > your _changes feed?
> >
>
> The failing databases do not share a common signature, some are very
> small, maybe 10 total documents, some may have more then 10 thousand.
> Some have had no changes for a very long time, some are recent. The
> failures shared no common ground based off my observations.
>
> Additional info:
>  - We have around 4200 databases
>  - The typical document is under 2kb, they are basically "table"
> rows, simple key/value pairs
>  - The changes feed is pretty small on most databases experiencing issues
>  - We compact databases which had changes each night
>  - A small percent, like 10% has attachments, they seem to not be
> related to our issues
>
> I am going to look into some of the alternative replicators you have
> given me, feel free to give any specific suggestions based on the
> above info.
>
> Thanks,
>
> -Chris
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Chris Stockton <ch...@gmail.com>.

Hello,

On Tue, Sep 13, 2011 at 11:44 AM, Max Ogden <ma...@maxogden.com> wrote:
> Hi Chris,
>
> From what I understand the current state of the replicator (as of 1.1) is
> that for certain types of collections of documents it can be somewhat
> fragile. In the case of the node.js package repository, http://npmjs.org,
> there are many relatively large (~100MB) documents that would sometimes
> throw errors or timeout during replication and crash the replicator, at
> which point the replicator would restart and attempt to pick up where it
> left off. I am not an expert in the internals of the replicator but
> apparently the cumulative time required for the replicator to repeatedly
> crash and then subsequently relocate itself in _changes feed in the case of
> replicating the node package manager was making the built in couch
> replicator unusable for the task.
>

First of all I thank you for your response, I appreciate your time. We
have had a rocky road with replication as well, everything from system
limits to single document/view/reduce errors causing processes to
spawn wildly crippling machines. We have slowly worked through them by
upping system limits and erlang VM limits.

I feel like the absolute root cause of our problem is that we scale
via many smaller databases instead of a single large one. We are at
about 4200 databases right now and its painful to netstat -nap|grep
beam|wc -l and see 4200 active tcp connections. I have brought up
suggestions and comments in the past about server wide replication,
with some simple filtering function so a small pool of tcp connections
and processes could be used, greatly improving our scaling pattern of
many, small databases. I would be able to allocate time to try to
contribute some kinda patch to do this, but I simply do not know
erlang and it is very far from the languages I know (c, java, php,
anything close to these.. erlang is a entirely different world)

I have thought about changing our replication processes to only do
single pass non-continuous replication, currently they manage and
reconcile dropped replication tasks by monitoring status, using the
continuous =true flag, but I may need to drop that at the cost of
possible data loss if we get a crash in between passes.

> Two solutions exist that I know of. There is a new replicator in trunk (not
> to be confused with the _replicator db from 1.1 -- it is still using the old
> replicator algorithms) and there is also a more reliable replicator written
> in node.js https://github.com/mikeal/replicate that was was written
> specifically to replicate the node package repository between hosting
> providers.
>

Is there any documentation on this? Although I have heard good things
I am not familiar with node.js, I am interested in any alternatives
that better fit our use cases. At the end of the day stability, data
consistency and reliability for our customers for me is the biggest
concern, right now we don't have that and it's what I'm aiming for, no
more 2AM noc phone calls is the goal! :- )

> Additionally it may be useful if you could describe the 'fingerprint' of
> your documents a bit. How many documents are in the failing databases? are
> the documents large or small? do they have many attachments? how large is
> your _changes feed?
>

The failing databases do not share a common signature, some are very
small, maybe 10 total documents, some may have more then 10 thousand.
Some have had no changes for a very long time, some are recent. The
failures shared no common ground based off my observations.

Additional info:
  - We have around 4200 databases
  - The typical document is under 2kb, they are basically "table"
rows, simple key/value pairs
  - The changes feed is pretty small on most databases experiencing issues
  - We compact databases which had changes each night
  - A small percent, like 10% has attachments, they seem to not be
related to our issues

I am going to look into some of the alternative replicators you have
given me, feel free to give any specific suggestions based on the
above info.

Thanks,

-Chris

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Mikeal Rogers <mi...@gmail.com>.

HAHA! I already forgot that we did this.

-Mikeal

On Sep 14, 2011, at September 14, 201112:51 PM, Randall Leeds wrote:

> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <ad...@gmail.com>wrote:
> 
>> There's a multipart API which allows for a single PUT request containing
>> the document body as JSON and all its attachments in their raw form.
>> Documentation is pretty thin at the moment, and unfortunately I think it
>> doesn't quite allow for a pipe(). Would be really nice if it did, though.
>> 
> 
> It does. We figured it out together a couple weeks ago and that's when this
> code came into being.
> Requesting a _specific_ revision with ?revs=true will give you a
> multipart/related response suitable for passing straight into a
> ?new_edits=false&rev= PUT.
> See https://github.com/mikeal/replicate/blob/master/main.js#L49
> 
> 
>> 
>> On Wednesday, September 14, 2011 at 1:16 PM, Mikeal Rogers wrote:
>> 
>>> npm is mostly attachments and I haven't seen any issues so far.
>>> 
>>> I wish there was a better way to replicate attachments atomically for a
>> single revision but if there is, I don't know about it.
>>> 
>>> It's probably a huge JSON operation and it sucks, but I don't have to
>> parse it in node.js, I just pipe() the body right along.
>>> 
>>> -Mikeal
>>> 
>>> On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote:
>>> 
>>>> Hi Mikeal, I just took a quick peek at your code. It looks like you
>> handle attachments by inlining all of them into the JSON representation of
>> the document. Does that ever cause problems when dealing with the ~100 MB
>> attachments in the npm repo?
>>>> 
>>>> I've certainly seen my fair share of problems with attachment
>> replication in CouchDB 1.0.x. I have a sneaking suspicion that there are
>> latent bugs related to incorrect determinations of Content-Length under
>> various compression scenarios.
>>>> 
>>>> Adam
>>>> 
>>>> On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote:
>>>> 
>>>>> My replicator is fairly young so I think calling it "reliable" might
>> be a little misleading.
>>>>> 
>>>>> It does less, I don't ever attempt to cache the high watermark (last
>> seq written) and start over from there. If the process crashes just start
>> over from scratch. This can lead to a delay after restart but I find that
>> it's much simpler and more reliable on failure.
>>>>> 
>>>>> It's also simpler because it doesn't have to content with being an
>> http client and a client of the internal couchdb erlang API. It just proxies
>> requests from one couch to another.
>>>>> 
>>>>> While I'm sure there are bugs that I haven't found yet in it, I can
>> say that it replicates the npm repository quite well and I'm using it in
>> production.
>>>>> 
>>>>> -Mikeal
>>>>> 
>>>>> On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:
>>>>> 
>>>>>> Hi Chris,
>>>>>> 
>>>>>> From what I understand the current state of the replicator (as of
>> 1.1) is
>>>>>> that for certain types of collections of documents it can be
>> somewhat
>>>>>> fragile. In the case of the node.js package repository,
>> http://npmjs.org,
>>>>>> there are many relatively large (~100MB) documents that would
>> sometimes
>>>>>> throw errors or timeout during replication and crash the
>> replicator, at
>>>>>> which point the replicator would restart and attempt to pick up
>> where it
>>>>>> left off. I am not an expert in the internals of the replicator but
>>>>>> apparently the cumulative time required for the replicator to
>> repeatedly
>>>>>> crash and then subsequently relocate itself in _changes feed in the
>> case of
>>>>>> replicating the node package manager was making the built in couch
>>>>>> replicator unusable for the task.
>>>>>> 
>>>>>> Two solutions exist that I know of. There is a new replicator in
>> trunk (not
>>>>>> to be confused with the _replicator db from 1.1 -- it is still
>> using the old
>>>>>> replicator algorithms) and there is also a more reliable replicator
>> written
>>>>>> in node.js https://github.com/mikeal/replicate that was was
>> written
>>>>>> specifically to replicate the node package repository between
>> hosting
>>>>>> providers.
>>>>>> 
>>>>>> Additionally it may be useful if you could describe the
>> 'fingerprint' of
>>>>>> your documents a bit. How many documents are in the failing
>> databases? are
>>>>>> the documents large or small? do they have many attachments? how
>> large is
>>>>>> your _changes feed?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
>>>>>> <chrisstocktonaz@gmail.com (mailto:chrisstocktonaz@gmail.com
>> )>wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> We now have about 150 dbs that are refusing to replicate with
>> random
>>>>>>> crashes, which provide really zero debug information. The error
>> is db
>>>>>>> not found, but I know its available. Does anyone know how can I
>>>>>>> trouble shoot this? Do we just have to many databases replicating
>> for
>>>>>>> couchdb to handle? 4000 is a small number for the massive
>> hardware
>>>>>>> these are running on.
>>>>>>> 
>>>>>>> -Chris
>> 
>> 
>>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Filipe David Manana <fd...@apache.org>.

On Fri, Sep 30, 2011 at 12:32 AM, Benoit Chesneau >
> what would be the cons? We should do it imo.

None that I can think of. Given the impossibility to GET a document
with empty ID, it seems sane to skip it and log an error message
rather than have the replication always crash and making it impossible
to replicate the database.

Patch, http://friendpaste.com/6yfuXjy6LBUWZXcs10aV3y

Example error message:

http://friendpaste.com/22n9Fqh4oe89Szn5Zt0Omg

>
> benoit
>
>
>
>
>>
>> On Thu, Sep 29, 2011 at 4:39 AM, Jason Smith <jh...@iriscouch.com> wrote:
>>> Do you mean the NPM registry? The raw couch is any of:
>>>
>>> * http://isaacs.iriscouch.com
>>> * http://isaacs.iriscouch.com:5984
>>> * https://isaacs.iriscouch.com
>>> * https://isaacs.iriscouch.com:6984
>>>
>>> Note, the official host is isaacs.iriscouch.net but over HTTPS it will
>>> only serve the SSL certificate for registry.npmjs.org. So you can use
>>> those above or disable cert verification. (Accessing
>
>>> registry.npmjs.org will vhost you to the couch app.)
>>>
>>> P.S. All Iris Couch accounts also have short domains, to help with
>>> testing and development.
>>>
>>> * http://isaacs.ic.ht
>>> * http://isaacs.ic.tl
>>>
>>> .ic.ht and .ic.tl are QWERTY-friendly (alternating right-hand,
>>> left-hand, like Unix commands). They were the shortest available
>>> domains with a "c" in them, to stand for "Couch". That is why I chose
>>> the name "Iris Couch" in the first place.
>>>
>>> On Thu, Sep 29, 2011 at 11:44 AM, Max Ogden <ma...@maxogden.com> wrote:
>>>> http://twitter.com/#!/maxogden/status/113702093512122368
>>>>
>>>> On Wed, Sep 28, 2011 at 9:37 PM, Filipe David Manana <
> fdmanana@apache.org>wrote:
>>>>
>>>>> Mikeal, or someone else, can you provide the url of that npm database?
>>>>> I would like to do some replication tests with it and report back here.
>>>>>
>>>>> thanks
>>>>>
>>>>> On Sat, Sep 17, 2011 at 8:01 AM, Adam Kocoloski
>>>>> <ad...@gmail.com> wrote:
>>>>> > On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
>>>>> >> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <
> adam.kocoloski@gmail.com(mailto:
>>>>> adam.kocoloski@gmail.com)>wrote:
>>>>> >>
>>>>> >> > There's a multipart API which allows for a single PUT request
>>>>> containing
>>>>> >> > the document body as JSON and all its attachments in their raw
> form.
>>>>> >> > Documentation is pretty thin at the moment, and unfortunately I
> think
>>>>> it
>>>>> >> > doesn't quite allow for a pipe(). Would be really nice if it did,
>>>>> though.
>>>>> >>
>>>>> >> It does. We figured it out together a couple weeks ago and that's
> when
>>>>> this
>>>>> >> code came into being.
>>>>> >> Requesting a _specific_ revision with ?revs=true will give you a
>>>>> >> multipart/related response suitable for passing straight into a
>>>>> >> ?new_edits=false&rev= PUT.
>>>>> >> See https://github.com/mikeal/replicate/blob/master/main.js#L49
>>>>> >>
>>>>> > Hah! That's what I get for spending too much time in the world of
> 1.0.x.
>>>>> Thanks for the correction Randall. Best,
>>>>> >
>>>>> > Adam
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Filipe David Manana,
>>>>>
>>>>> "Reasonable men adapt themselves to the world.
>>>>>  Unreasonable men adapt the world to themselves.
>>>>>  That's why all progress depends on unreasonable men."
>>>>>
>>>>
>>>
>>><
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Benoit Chesneau <bc...@gmail.com>.

On Friday, September 30, 2011, Filipe David Manana <fd...@apache.org>
wrote:
> Max, Jason, thanks for the reply.
> I tried replicating it, and indeed it doesn't work - the replicator
> crashes. The reason for this is that there's a document with an empty
> ID.
>
> From the _changes stream:
>
> (.....)
>
{"seq":3518,"id":"","changes":[{"rev":"1-2f11e026763c10730d8b19ba5dce7565"}]},
> (.....)
>
>
> The possibility of adding a document with an empty ID was bug in
> previous Couch releases that was fixed some time ago. Basically the
> replicator, or any other client, can't access a document with the
> empty ID.
>
> Looking at Mikeal's node.js replicator, it's skipping the document if
> its ID is empty:
>
> https://github.com/mikeal/replicate/blob/master/main.js#L122
>
> This seems like a sane solution and something I'm considering adding
> to the replicators, skipping the document and logging an error message
> to inform users.

what would be the cons? We should do it imo.

benoit




>
> On Thu, Sep 29, 2011 at 4:39 AM, Jason Smith <jh...@iriscouch.com> wrote:
>> Do you mean the NPM registry? The raw couch is any of:
>>
>> * http://isaacs.iriscouch.com
>> * http://isaacs.iriscouch.com:5984
>> * https://isaacs.iriscouch.com
>> * https://isaacs.iriscouch.com:6984
>>
>> Note, the official host is isaacs.iriscouch.net but over HTTPS it will
>> only serve the SSL certificate for registry.npmjs.org. So you can use
>> those above or disable cert verification. (Accessing

>> registry.npmjs.org will vhost you to the couch app.)
>>
>> P.S. All Iris Couch accounts also have short domains, to help with
>> testing and development.
>>
>> * http://isaacs.ic.ht
>> * http://isaacs.ic.tl
>>
>> .ic.ht and .ic.tl are QWERTY-friendly (alternating right-hand,
>> left-hand, like Unix commands). They were the shortest available
>> domains with a "c" in them, to stand for "Couch". That is why I chose
>> the name "Iris Couch" in the first place.
>>
>> On Thu, Sep 29, 2011 at 11:44 AM, Max Ogden <ma...@maxogden.com> wrote:
>>> http://twitter.com/#!/maxogden/status/113702093512122368
>>>
>>> On Wed, Sep 28, 2011 at 9:37 PM, Filipe David Manana <
fdmanana@apache.org>wrote:
>>>
>>>> Mikeal, or someone else, can you provide the url of that npm database?
>>>> I would like to do some replication tests with it and report back here.
>>>>
>>>> thanks
>>>>
>>>> On Sat, Sep 17, 2011 at 8:01 AM, Adam Kocoloski
>>>> <ad...@gmail.com> wrote:
>>>> > On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
>>>> >> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <
adam.kocoloski@gmail.com(mailto:
>>>> adam.kocoloski@gmail.com)>wrote:
>>>> >>
>>>> >> > There's a multipart API which allows for a single PUT request
>>>> containing
>>>> >> > the document body as JSON and all its attachments in their raw
form.
>>>> >> > Documentation is pretty thin at the moment, and unfortunately I
think
>>>> it
>>>> >> > doesn't quite allow for a pipe(). Would be really nice if it did,
>>>> though.
>>>> >>
>>>> >> It does. We figured it out together a couple weeks ago and that's
when
>>>> this
>>>> >> code came into being.
>>>> >> Requesting a _specific_ revision with ?revs=true will give you a
>>>> >> multipart/related response suitable for passing straight into a
>>>> >> ?new_edits=false&rev= PUT.
>>>> >> See https://github.com/mikeal/replicate/blob/master/main.js#L49
>>>> >>
>>>> > Hah! That's what I get for spending too much time in the world of
1.0.x.
>>>> Thanks for the correction Randall. Best,
>>>> >
>>>> > Adam
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Filipe David Manana,
>>>>
>>>> "Reasonable men adapt themselves to the world.
>>>>  Unreasonable men adapt the world to themselves.
>>>>  That's why all progress depends on unreasonable men."
>>>>
>>>
>>
>><

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Filipe David Manana <fd...@apache.org>.

Max, Jason, thanks for the reply.
I tried replicating it, and indeed it doesn't work - the replicator
crashes. The reason for this is that there's a document with an empty
ID.

>From the _changes stream:

(.....)
{"seq":3518,"id":"","changes":[{"rev":"1-2f11e026763c10730d8b19ba5dce7565"}]},
(.....)


The possibility of adding a document with an empty ID was bug in
previous Couch releases that was fixed some time ago. Basically the
replicator, or any other client, can't access a document with the
empty ID.

Looking at Mikeal's node.js replicator, it's skipping the document if
its ID is empty:

https://github.com/mikeal/replicate/blob/master/main.js#L122

This seems like a sane solution and something I'm considering adding
to the replicators, skipping the document and logging an error message
to inform users.

On Thu, Sep 29, 2011 at 4:39 AM, Jason Smith <jh...@iriscouch.com> wrote:
> Do you mean the NPM registry? The raw couch is any of:
>
> * http://isaacs.iriscouch.com
> * http://isaacs.iriscouch.com:5984
> * https://isaacs.iriscouch.com
> * https://isaacs.iriscouch.com:6984
>
> Note, the official host is isaacs.iriscouch.net but over HTTPS it will
> only serve the SSL certificate for registry.npmjs.org. So you can use
> those above or disable cert verification. (Accessing
> registry.npmjs.org will vhost you to the couch app.)
>
> P.S. All Iris Couch accounts also have short domains, to help with
> testing and development.
>
> * http://isaacs.ic.ht
> * http://isaacs.ic.tl
>
> .ic.ht and .ic.tl are QWERTY-friendly (alternating right-hand,
> left-hand, like Unix commands). They were the shortest available
> domains with a "c" in them, to stand for "Couch". That is why I chose
> the name "Iris Couch" in the first place.
>
> On Thu, Sep 29, 2011 at 11:44 AM, Max Ogden <ma...@maxogden.com> wrote:
>> http://twitter.com/#!/maxogden/status/113702093512122368
>>
>> On Wed, Sep 28, 2011 at 9:37 PM, Filipe David Manana <fd...@apache.org>wrote:
>>
>>> Mikeal, or someone else, can you provide the url of that npm database?
>>> I would like to do some replication tests with it and report back here.
>>>
>>> thanks
>>>
>>> On Sat, Sep 17, 2011 at 8:01 AM, Adam Kocoloski
>>> <ad...@gmail.com> wrote:
>>> > On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
>>> >> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <adam.kocoloski@gmail.com(mailto:
>>> adam.kocoloski@gmail.com)>wrote:
>>> >>
>>> >> > There's a multipart API which allows for a single PUT request
>>> containing
>>> >> > the document body as JSON and all its attachments in their raw form.
>>> >> > Documentation is pretty thin at the moment, and unfortunately I think
>>> it
>>> >> > doesn't quite allow for a pipe(). Would be really nice if it did,
>>> though.
>>> >>
>>> >> It does. We figured it out together a couple weeks ago and that's when
>>> this
>>> >> code came into being.
>>> >> Requesting a _specific_ revision with ?revs=true will give you a
>>> >> multipart/related response suitable for passing straight into a
>>> >> ?new_edits=false&rev= PUT.
>>> >> See https://github.com/mikeal/replicate/blob/master/main.js#L49
>>> >>
>>> > Hah! That's what I get for spending too much time in the world of 1.0.x.
>>> Thanks for the correction Randall. Best,
>>> >
>>> > Adam
>>> >
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."
>>>
>>
>
>
>
> --
> Iris Couch
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Jason Smith <jh...@iriscouch.com>.

Do you mean the NPM registry? The raw couch is any of:

* http://isaacs.iriscouch.com
* http://isaacs.iriscouch.com:5984
* https://isaacs.iriscouch.com
* https://isaacs.iriscouch.com:6984

Note, the official host is isaacs.iriscouch.net but over HTTPS it will
only serve the SSL certificate for registry.npmjs.org. So you can use
those above or disable cert verification. (Accessing
registry.npmjs.org will vhost you to the couch app.)

P.S. All Iris Couch accounts also have short domains, to help with
testing and development.

* http://isaacs.ic.ht
* http://isaacs.ic.tl

.ic.ht and .ic.tl are QWERTY-friendly (alternating right-hand,
left-hand, like Unix commands). They were the shortest available
domains with a "c" in them, to stand for "Couch". That is why I chose
the name "Iris Couch" in the first place.

On Thu, Sep 29, 2011 at 11:44 AM, Max Ogden <ma...@maxogden.com> wrote:
> http://twitter.com/#!/maxogden/status/113702093512122368
>
> On Wed, Sep 28, 2011 at 9:37 PM, Filipe David Manana <fd...@apache.org>wrote:
>
>> Mikeal, or someone else, can you provide the url of that npm database?
>> I would like to do some replication tests with it and report back here.
>>
>> thanks
>>
>> On Sat, Sep 17, 2011 at 8:01 AM, Adam Kocoloski
>> <ad...@gmail.com> wrote:
>> > On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
>> >> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <adam.kocoloski@gmail.com(mailto:
>> adam.kocoloski@gmail.com)>wrote:
>> >>
>> >> > There's a multipart API which allows for a single PUT request
>> containing
>> >> > the document body as JSON and all its attachments in their raw form.
>> >> > Documentation is pretty thin at the moment, and unfortunately I think
>> it
>> >> > doesn't quite allow for a pipe(). Would be really nice if it did,
>> though.
>> >>
>> >> It does. We figured it out together a couple weeks ago and that's when
>> this
>> >> code came into being.
>> >> Requesting a _specific_ revision with ?revs=true will give you a
>> >> multipart/related response suitable for passing straight into a
>> >> ?new_edits=false&rev= PUT.
>> >> See https://github.com/mikeal/replicate/blob/master/main.js#L49
>> >>
>> > Hah! That's what I get for spending too much time in the world of 1.0.x.
>> Thanks for the correction Randall. Best,
>> >
>> > Adam
>> >
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>>
>



-- 
Iris Couch

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Max Ogden <ma...@maxogden.com>.

http://twitter.com/#!/maxogden/status/113702093512122368

On Wed, Sep 28, 2011 at 9:37 PM, Filipe David Manana <fd...@apache.org>wrote:

> Mikeal, or someone else, can you provide the url of that npm database?
> I would like to do some replication tests with it and report back here.
>
> thanks
>
> On Sat, Sep 17, 2011 at 8:01 AM, Adam Kocoloski
> <ad...@gmail.com> wrote:
> > On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
> >> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <adam.kocoloski@gmail.com(mailto:
> adam.kocoloski@gmail.com)>wrote:
> >>
> >> > There's a multipart API which allows for a single PUT request
> containing
> >> > the document body as JSON and all its attachments in their raw form.
> >> > Documentation is pretty thin at the moment, and unfortunately I think
> it
> >> > doesn't quite allow for a pipe(). Would be really nice if it did,
> though.
> >>
> >> It does. We figured it out together a couple weeks ago and that's when
> this
> >> code came into being.
> >> Requesting a _specific_ revision with ?revs=true will give you a
> >> multipart/related response suitable for passing straight into a
> >> ?new_edits=false&rev= PUT.
> >> See https://github.com/mikeal/replicate/blob/master/main.js#L49
> >>
> > Hah! That's what I get for spending too much time in the world of 1.0.x.
> Thanks for the correction Randall. Best,
> >
> > Adam
> >
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Filipe David Manana <fd...@apache.org>.

Mikeal, or someone else, can you provide the url of that npm database?
I would like to do some replication tests with it and report back here.

thanks

On Sat, Sep 17, 2011 at 8:01 AM, Adam Kocoloski
<ad...@gmail.com> wrote:
> On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
>> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <adam.kocoloski@gmail.com (mailto:adam.kocoloski@gmail.com)>wrote:
>>
>> > There's a multipart API which allows for a single PUT request containing
>> > the document body as JSON and all its attachments in their raw form.
>> > Documentation is pretty thin at the moment, and unfortunately I think it
>> > doesn't quite allow for a pipe(). Would be really nice if it did, though.
>>
>> It does. We figured it out together a couple weeks ago and that's when this
>> code came into being.
>> Requesting a _specific_ revision with ?revs=true will give you a
>> multipart/related response suitable for passing straight into a
>> ?new_edits=false&rev= PUT.
>> See https://github.com/mikeal/replicate/blob/master/main.js#L49
>>
> Hah! That's what I get for spending too much time in the world of 1.0.x. Thanks for the correction Randall. Best,
>
> Adam
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Adam Kocoloski <ad...@gmail.com>.

On Wednesday, September 14, 2011 at 3:51 PM, Randall Leeds wrote:
> On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <adam.kocoloski@gmail.com (mailto:adam.kocoloski@gmail.com)>wrote:
> 
> > There's a multipart API which allows for a single PUT request containing
> > the document body as JSON and all its attachments in their raw form.
> > Documentation is pretty thin at the moment, and unfortunately I think it
> > doesn't quite allow for a pipe(). Would be really nice if it did, though.
> 
> It does. We figured it out together a couple weeks ago and that's when this
> code came into being.
> Requesting a _specific_ revision with ?revs=true will give you a
> multipart/related response suitable for passing straight into a
> ?new_edits=false&rev= PUT.
> See https://github.com/mikeal/replicate/blob/master/main.js#L49
> 
Hah! That's what I get for spending too much time in the world of 1.0.x. Thanks for the correction Randall. Best,

Adam

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Randall Leeds <ra...@gmail.com>.

On Wed, Sep 14, 2011 at 12:19, Adam Kocoloski <ad...@gmail.com>wrote:

> There's a multipart API which allows for a single PUT request containing
> the document body as JSON and all its attachments in their raw form.
> Documentation is pretty thin at the moment, and unfortunately I think it
> doesn't quite allow for a pipe(). Would be really nice if it did, though.
>

It does. We figured it out together a couple weeks ago and that's when this
code came into being.
Requesting a _specific_ revision with ?revs=true will give you a
multipart/related response suitable for passing straight into a
?new_edits=false&rev= PUT.
See https://github.com/mikeal/replicate/blob/master/main.js#L49


>
> On Wednesday, September 14, 2011 at 1:16 PM, Mikeal Rogers wrote:
>
> > npm is mostly attachments and I haven't seen any issues so far.
> >
> > I wish there was a better way to replicate attachments atomically for a
> single revision but if there is, I don't know about it.
> >
> > It's probably a huge JSON operation and it sucks, but I don't have to
> parse it in node.js, I just pipe() the body right along.
> >
> > -Mikeal
> >
> > On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote:
> >
> > > Hi Mikeal, I just took a quick peek at your code. It looks like you
> handle attachments by inlining all of them into the JSON representation of
> the document. Does that ever cause problems when dealing with the ~100 MB
> attachments in the npm repo?
> > >
> > > I've certainly seen my fair share of problems with attachment
> replication in CouchDB 1.0.x. I have a sneaking suspicion that there are
> latent bugs related to incorrect determinations of Content-Length under
> various compression scenarios.
> > >
> > > Adam
> > >
> > > On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote:
> > >
> > > > My replicator is fairly young so I think calling it "reliable" might
> be a little misleading.
> > > >
> > > > It does less, I don't ever attempt to cache the high watermark (last
> seq written) and start over from there. If the process crashes just start
> over from scratch. This can lead to a delay after restart but I find that
> it's much simpler and more reliable on failure.
> > > >
> > > > It's also simpler because it doesn't have to content with being an
> http client and a client of the internal couchdb erlang API. It just proxies
> requests from one couch to another.
> > > >
> > > > While I'm sure there are bugs that I haven't found yet in it, I can
> say that it replicates the npm repository quite well and I'm using it in
> production.
> > > >
> > > > -Mikeal
> > > >
> > > > On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > > From what I understand the current state of the replicator (as of
> 1.1) is
> > > > > that for certain types of collections of documents it can be
> somewhat
> > > > > fragile. In the case of the node.js package repository,
> http://npmjs.org,
> > > > > there are many relatively large (~100MB) documents that would
> sometimes
> > > > > throw errors or timeout during replication and crash the
> replicator, at
> > > > > which point the replicator would restart and attempt to pick up
> where it
> > > > > left off. I am not an expert in the internals of the replicator but
> > > > > apparently the cumulative time required for the replicator to
> repeatedly
> > > > > crash and then subsequently relocate itself in _changes feed in the
> case of
> > > > > replicating the node package manager was making the built in couch
> > > > > replicator unusable for the task.
> > > > >
> > > > > Two solutions exist that I know of. There is a new replicator in
> trunk (not
> > > > > to be confused with the _replicator db from 1.1 -- it is still
> using the old
> > > > > replicator algorithms) and there is also a more reliable replicator
> written
> > > > > in node.js https://github.com/mikeal/replicate that was was
> written
> > > > > specifically to replicate the node package repository between
> hosting
> > > > > providers.
> > > > >
> > > > > Additionally it may be useful if you could describe the
> 'fingerprint' of
> > > > > your documents a bit. How many documents are in the failing
> databases? are
> > > > > the documents large or small? do they have many attachments? how
> large is
> > > > > your _changes feed?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Max
> > > > >
> > > > > On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
> > > > > <chrisstocktonaz@gmail.com (mailto:chrisstocktonaz@gmail.com
> )>wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > We now have about 150 dbs that are refusing to replicate with
> random
> > > > > > crashes, which provide really zero debug information. The error
> is db
> > > > > > not found, but I know its available. Does anyone know how can I
> > > > > > trouble shoot this? Do we just have to many databases replicating
> for
> > > > > > couchdb to handle? 4000 is a small number for the massive
> hardware
> > > > > > these are running on.
> > > > > >
> > > > > > -Chris
>
>
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Randall Leeds <ra...@gmail.com>.

On Wed, Sep 14, 2011 at 13:55, Jens Alfke <je...@couchbase.com> wrote:

>
> On Sep 14, 2011, at September 14, 201112:19 PM, Adam Kocoloski wrote:
>
> There's a multipart API which allows for a single PUT request containing
> the document body as JSON and all its attachments in their raw form.
> Documentation is pretty thin at the moment, and unfortunately I think it
> doesn't quite allow for a pipe(). Would be really nice if it did, though.
>
> Any tips on how to invoke this? It would be very useful in the framework
> I’m working on.
> I just checked <http://wiki.apache.org/couchdb/HTTP_Document_API> and
> there’s nothing about it.
>

Check out the lines in mikeal replicator I linked to. It should give you
some hints.
I think it's PUT with Content-Type: multipart/related and then you just send
the body followed by the attachments. I think you need to have an
_attachments object in the body and for each attachment specify
content_type, length, and follows: true.

>
> —Jens
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Jens Alfke <je...@couchbase.com>.

On Sep 14, 2011, at September 14, 201112:19 PM, Adam Kocoloski wrote:

There's a multipart API which allows for a single PUT request containing the document body as JSON and all its attachments in their raw form. Documentation is pretty thin at the moment, and unfortunately I think it doesn't quite allow for a pipe(). Would be really nice if it did, though.

Any tips on how to invoke this? It would be very useful in the framework I’m working on.
I just checked <http://wiki.apache.org/couchdb/HTTP_Document_API> and there’s nothing about it.

—Jens

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Mikeal Rogers <mi...@gmail.com>.

Yeah, what i need is a GET that will return the document, with attachments, in that format.

-Mikeal

On Sep 14, 2011, at September 14, 201112:19 PM, Adam Kocoloski wrote:

> There's a multipart API which allows for a single PUT request containing the document body as JSON and all its attachments in their raw form. Documentation is pretty thin at the moment, and unfortunately I think it doesn't quite allow for a pipe(). Would be really nice if it did, though.
> 
> On Wednesday, September 14, 2011 at 1:16 PM, Mikeal Rogers wrote:
> 
>> npm is mostly attachments and I haven't seen any issues so far.
>> 
>> I wish there was a better way to replicate attachments atomically for a single revision but if there is, I don't know about it.
>> 
>> It's probably a huge JSON operation and it sucks, but I don't have to parse it in node.js, I just pipe() the body right along.
>> 
>> -Mikeal
>> 
>> On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote:
>> 
>>> Hi Mikeal, I just took a quick peek at your code. It looks like you handle attachments by inlining all of them into the JSON representation of the document. Does that ever cause problems when dealing with the ~100 MB attachments in the npm repo?
>>> 
>>> I've certainly seen my fair share of problems with attachment replication in CouchDB 1.0.x. I have a sneaking suspicion that there are latent bugs related to incorrect determinations of Content-Length under various compression scenarios.
>>> 
>>> Adam
>>> 
>>> On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote:
>>> 
>>>> My replicator is fairly young so I think calling it "reliable" might be a little misleading.
>>>> 
>>>> It does less, I don't ever attempt to cache the high watermark (last seq written) and start over from there. If the process crashes just start over from scratch. This can lead to a delay after restart but I find that it's much simpler and more reliable on failure.
>>>> 
>>>> It's also simpler because it doesn't have to content with being an http client and a client of the internal couchdb erlang API. It just proxies requests from one couch to another.
>>>> 
>>>> While I'm sure there are bugs that I haven't found yet in it, I can say that it replicates the npm repository quite well and I'm using it in production.
>>>> 
>>>> -Mikeal
>>>> 
>>>> On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:
>>>> 
>>>>> Hi Chris,
>>>>> 
>>>>> From what I understand the current state of the replicator (as of 1.1) is
>>>>> that for certain types of collections of documents it can be somewhat
>>>>> fragile. In the case of the node.js package repository, http://npmjs.org,
>>>>> there are many relatively large (~100MB) documents that would sometimes
>>>>> throw errors or timeout during replication and crash the replicator, at
>>>>> which point the replicator would restart and attempt to pick up where it
>>>>> left off. I am not an expert in the internals of the replicator but
>>>>> apparently the cumulative time required for the replicator to repeatedly
>>>>> crash and then subsequently relocate itself in _changes feed in the case of
>>>>> replicating the node package manager was making the built in couch
>>>>> replicator unusable for the task.
>>>>> 
>>>>> Two solutions exist that I know of. There is a new replicator in trunk (not
>>>>> to be confused with the _replicator db from 1.1 -- it is still using the old
>>>>> replicator algorithms) and there is also a more reliable replicator written
>>>>> in node.js https://github.com/mikeal/replicate that was was written
>>>>> specifically to replicate the node package repository between hosting
>>>>> providers.
>>>>> 
>>>>> Additionally it may be useful if you could describe the 'fingerprint' of
>>>>> your documents a bit. How many documents are in the failing databases? are
>>>>> the documents large or small? do they have many attachments? how large is
>>>>> your _changes feed?
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Max
>>>>> 
>>>>> On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
>>>>> <chrisstocktonaz@gmail.com (mailto:chrisstocktonaz@gmail.com)>wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> We now have about 150 dbs that are refusing to replicate with random
>>>>>> crashes, which provide really zero debug information. The error is db
>>>>>> not found, but I know its available. Does anyone know how can I
>>>>>> trouble shoot this? Do we just have to many databases replicating for
>>>>>> couchdb to handle? 4000 is a small number for the massive hardware
>>>>>> these are running on.
>>>>>> 
>>>>>> -Chris
> 
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Adam Kocoloski <ad...@gmail.com>.

There's a multipart API which allows for a single PUT request containing the document body as JSON and all its attachments in their raw form. Documentation is pretty thin at the moment, and unfortunately I think it doesn't quite allow for a pipe(). Would be really nice if it did, though.

On Wednesday, September 14, 2011 at 1:16 PM, Mikeal Rogers wrote:

> npm is mostly attachments and I haven't seen any issues so far.
> 
> I wish there was a better way to replicate attachments atomically for a single revision but if there is, I don't know about it.
> 
> It's probably a huge JSON operation and it sucks, but I don't have to parse it in node.js, I just pipe() the body right along.
> 
> -Mikeal
> 
> On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote:
> 
> > Hi Mikeal, I just took a quick peek at your code. It looks like you handle attachments by inlining all of them into the JSON representation of the document. Does that ever cause problems when dealing with the ~100 MB attachments in the npm repo?
> > 
> > I've certainly seen my fair share of problems with attachment replication in CouchDB 1.0.x. I have a sneaking suspicion that there are latent bugs related to incorrect determinations of Content-Length under various compression scenarios.
> > 
> > Adam
> > 
> > On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote:
> > 
> > > My replicator is fairly young so I think calling it "reliable" might be a little misleading.
> > > 
> > > It does less, I don't ever attempt to cache the high watermark (last seq written) and start over from there. If the process crashes just start over from scratch. This can lead to a delay after restart but I find that it's much simpler and more reliable on failure.
> > > 
> > > It's also simpler because it doesn't have to content with being an http client and a client of the internal couchdb erlang API. It just proxies requests from one couch to another.
> > > 
> > > While I'm sure there are bugs that I haven't found yet in it, I can say that it replicates the npm repository quite well and I'm using it in production.
> > > 
> > > -Mikeal
> > > 
> > > On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:
> > > 
> > > > Hi Chris,
> > > > 
> > > > From what I understand the current state of the replicator (as of 1.1) is
> > > > that for certain types of collections of documents it can be somewhat
> > > > fragile. In the case of the node.js package repository, http://npmjs.org,
> > > > there are many relatively large (~100MB) documents that would sometimes
> > > > throw errors or timeout during replication and crash the replicator, at
> > > > which point the replicator would restart and attempt to pick up where it
> > > > left off. I am not an expert in the internals of the replicator but
> > > > apparently the cumulative time required for the replicator to repeatedly
> > > > crash and then subsequently relocate itself in _changes feed in the case of
> > > > replicating the node package manager was making the built in couch
> > > > replicator unusable for the task.
> > > > 
> > > > Two solutions exist that I know of. There is a new replicator in trunk (not
> > > > to be confused with the _replicator db from 1.1 -- it is still using the old
> > > > replicator algorithms) and there is also a more reliable replicator written
> > > > in node.js https://github.com/mikeal/replicate that was was written
> > > > specifically to replicate the node package repository between hosting
> > > > providers.
> > > > 
> > > > Additionally it may be useful if you could describe the 'fingerprint' of
> > > > your documents a bit. How many documents are in the failing databases? are
> > > > the documents large or small? do they have many attachments? how large is
> > > > your _changes feed?
> > > > 
> > > > Cheers,
> > > > 
> > > > Max
> > > > 
> > > > On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
> > > > <chrisstocktonaz@gmail.com (mailto:chrisstocktonaz@gmail.com)>wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > We now have about 150 dbs that are refusing to replicate with random
> > > > > crashes, which provide really zero debug information. The error is db
> > > > > not found, but I know its available. Does anyone know how can I
> > > > > trouble shoot this? Do we just have to many databases replicating for
> > > > > couchdb to handle? 4000 is a small number for the massive hardware
> > > > > these are running on.
> > > > > 
> > > > > -Chris

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Mikeal Rogers <mi...@gmail.com>.

npm is mostly attachments and I haven't seen any issues so far.

I wish there was a better way to replicate attachments atomically for a single revision but if there is, I don't know about it.

It's probably a huge JSON operation and it sucks, but I don't have to parse it in node.js, I just pipe() the body right along.

-Mikeal

On Sep 14, 2011, at September 14, 20118:42 AM, Adam Kocoloski wrote:

> Hi Mikeal, I just took a quick peek at your code. It looks like you handle attachments by inlining all of them into the JSON representation of the document. Does that ever cause problems when dealing with the ~100 MB attachments in the npm repo?
> 
> I've certainly seen my fair share of problems with attachment replication in CouchDB 1.0.x. I have a sneaking suspicion that there are latent bugs related to incorrect determinations of Content-Length under various compression scenarios.
> 
> Adam
> 
> On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote:
> 
>> My replicator is fairly young so I think calling it "reliable" might be a little misleading.
>> 
>> It does less, I don't ever attempt to cache the high watermark (last seq written) and start over from there. If the process crashes just start over from scratch. This can lead to a delay after restart but I find that it's much simpler and more reliable on failure.
>> 
>> It's also simpler because it doesn't have to content with being an http client and a client of the internal couchdb erlang API. It just proxies requests from one couch to another.
>> 
>> While I'm sure there are bugs that I haven't found yet in it, I can say that it replicates the npm repository quite well and I'm using it in production.
>> 
>> -Mikeal
>> 
>> On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:
>> 
>>> Hi Chris,
>>> 
>>> From what I understand the current state of the replicator (as of 1.1) is
>>> that for certain types of collections of documents it can be somewhat
>>> fragile. In the case of the node.js package repository, http://npmjs.org,
>>> there are many relatively large (~100MB) documents that would sometimes
>>> throw errors or timeout during replication and crash the replicator, at
>>> which point the replicator would restart and attempt to pick up where it
>>> left off. I am not an expert in the internals of the replicator but
>>> apparently the cumulative time required for the replicator to repeatedly
>>> crash and then subsequently relocate itself in _changes feed in the case of
>>> replicating the node package manager was making the built in couch
>>> replicator unusable for the task.
>>> 
>>> Two solutions exist that I know of. There is a new replicator in trunk (not
>>> to be confused with the _replicator db from 1.1 -- it is still using the old
>>> replicator algorithms) and there is also a more reliable replicator written
>>> in node.js https://github.com/mikeal/replicate that was was written
>>> specifically to replicate the node package repository between hosting
>>> providers.
>>> 
>>> Additionally it may be useful if you could describe the 'fingerprint' of
>>> your documents a bit. How many documents are in the failing databases? are
>>> the documents large or small? do they have many attachments? how large is
>>> your _changes feed?
>>> 
>>> Cheers,
>>> 
>>> Max
>>> 
>>> On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
>>> <chrisstocktonaz@gmail.com (mailto:chrisstocktonaz@gmail.com)>wrote:
>>> 
>>>> Hello,
>>>> 
>>>> We now have about 150 dbs that are refusing to replicate with random
>>>> crashes, which provide really zero debug information. The error is db
>>>> not found, but I know its available. Does anyone know how can I
>>>> trouble shoot this? Do we just have to many databases replicating for
>>>> couchdb to handle? 4000 is a small number for the massive hardware
>>>> these are running on.
>>>> 
>>>> -Chris
> 
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Adam Kocoloski <ad...@gmail.com>.

Hi Mikeal, I just took a quick peek at your code. It looks like you handle attachments by inlining all of them into the JSON representation of the document. Does that ever cause problems when dealing with the ~100 MB attachments in the npm repo?

I've certainly seen my fair share of problems with attachment replication in CouchDB 1.0.x. I have a sneaking suspicion that there are latent bugs related to incorrect determinations of Content-Length under various compression scenarios.

Adam

On Tuesday, September 13, 2011 at 5:08 PM, Mikeal Rogers wrote:

> My replicator is fairly young so I think calling it "reliable" might be a little misleading.
> 
> It does less, I don't ever attempt to cache the high watermark (last seq written) and start over from there. If the process crashes just start over from scratch. This can lead to a delay after restart but I find that it's much simpler and more reliable on failure.
> 
> It's also simpler because it doesn't have to content with being an http client and a client of the internal couchdb erlang API. It just proxies requests from one couch to another.
> 
> While I'm sure there are bugs that I haven't found yet in it, I can say that it replicates the npm repository quite well and I'm using it in production.
> 
> -Mikeal
> 
> On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:
> 
> > Hi Chris,
> > 
> > From what I understand the current state of the replicator (as of 1.1) is
> > that for certain types of collections of documents it can be somewhat
> > fragile. In the case of the node.js package repository, http://npmjs.org,
> > there are many relatively large (~100MB) documents that would sometimes
> > throw errors or timeout during replication and crash the replicator, at
> > which point the replicator would restart and attempt to pick up where it
> > left off. I am not an expert in the internals of the replicator but
> > apparently the cumulative time required for the replicator to repeatedly
> > crash and then subsequently relocate itself in _changes feed in the case of
> > replicating the node package manager was making the built in couch
> > replicator unusable for the task.
> > 
> > Two solutions exist that I know of. There is a new replicator in trunk (not
> > to be confused with the _replicator db from 1.1 -- it is still using the old
> > replicator algorithms) and there is also a more reliable replicator written
> > in node.js https://github.com/mikeal/replicate that was was written
> > specifically to replicate the node package repository between hosting
> > providers.
> > 
> > Additionally it may be useful if you could describe the 'fingerprint' of
> > your documents a bit. How many documents are in the failing databases? are
> > the documents large or small? do they have many attachments? how large is
> > your _changes feed?
> > 
> > Cheers,
> > 
> > Max
> > 
> > On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
> > <chrisstocktonaz@gmail.com (mailto:chrisstocktonaz@gmail.com)>wrote:
> > 
> > > Hello,
> > > 
> > > We now have about 150 dbs that are refusing to replicate with random
> > > crashes, which provide really zero debug information. The error is db
> > > not found, but I know its available. Does anyone know how can I
> > > trouble shoot this? Do we just have to many databases replicating for
> > > couchdb to handle? 4000 is a small number for the massive hardware
> > > these are running on.
> > > 
> > > -Chris

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Mikeal Rogers <mi...@gmail.com>.

My replicator is fairly young so I think calling it "reliable" might be a little misleading.

It does less, I don't ever attempt to cache the high watermark (last seq written) and start over from there. If the process crashes just start over from scratch. This can lead to a delay after restart but I find that it's much simpler and more reliable on failure.

It's also simpler because it doesn't have to content with being an http client and a client of the internal couchdb erlang API. It just proxies requests from one couch to another.

While I'm sure there are bugs that I haven't found yet in it, I can say that it replicates the npm repository quite well and I'm using it in production.

-Mikeal

On Sep 13, 2011, at September 13, 201111:44 AM, Max Ogden wrote:

> Hi Chris,
> 
> From what I understand the current state of the replicator (as of 1.1) is
> that for certain types of collections of documents it can be somewhat
> fragile. In the case of the node.js package repository, http://npmjs.org,
> there are many relatively large (~100MB) documents that would sometimes
> throw errors or timeout during replication and crash the replicator, at
> which point the replicator would restart and attempt to pick up where it
> left off. I am not an expert in the internals of the replicator but
> apparently the cumulative time required for the replicator to repeatedly
> crash and then subsequently relocate itself in _changes feed in the case of
> replicating the node package manager was making the built in couch
> replicator unusable for the task.
> 
> Two solutions exist that I know of. There is a new replicator in trunk (not
> to be confused with the _replicator db from 1.1 -- it is still using the old
> replicator algorithms) and there is also a more reliable replicator written
> in node.js https://github.com/mikeal/replicate that was was written
> specifically to replicate the node package repository between hosting
> providers.
> 
> Additionally it may be useful if you could describe the 'fingerprint' of
> your documents a bit. How many documents are in the failing databases? are
> the documents large or small? do they have many attachments? how large is
> your _changes feed?
> 
> Cheers,
> 
> Max
> 
> On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
> <ch...@gmail.com>wrote:
> 
>> Hello,
>> 
>> We now have about 150 dbs that are refusing to replicate with random
>> crashes, which provide really zero debug information. The error is db
>> not found, but I know its available. Does anyone know how can I
>> trouble shoot this? Do we just have to many databases replicating for
>> couchdb to handle? 4000 is a small number for the massive hardware
>> these are running on.
>> 
>> -Chris
>>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Max Ogden <ma...@maxogden.com>.

Hi Chris,

>From what I understand the current state of the replicator (as of 1.1) is
that for certain types of collections of documents it can be somewhat
fragile. In the case of the node.js package repository, http://npmjs.org,
there are many relatively large (~100MB) documents that would sometimes
throw errors or timeout during replication and crash the replicator, at
which point the replicator would restart and attempt to pick up where it
left off. I am not an expert in the internals of the replicator but
apparently the cumulative time required for the replicator to repeatedly
crash and then subsequently relocate itself in _changes feed in the case of
replicating the node package manager was making the built in couch
replicator unusable for the task.

Two solutions exist that I know of. There is a new replicator in trunk (not
to be confused with the _replicator db from 1.1 -- it is still using the old
replicator algorithms) and there is also a more reliable replicator written
in node.js https://github.com/mikeal/replicate that was was written
specifically to replicate the node package repository between hosting
providers.

Additionally it may be useful if you could describe the 'fingerprint' of
your documents a bit. How many documents are in the failing databases? are
the documents large or small? do they have many attachments? how large is
your _changes feed?

Cheers,

Max

On Tue, Sep 13, 2011 at 11:22 AM, Chris Stockton
<ch...@gmail.com>wrote:

> Hello,
>
> We now have about 150 dbs that are refusing to replicate with random
> crashes, which provide really zero debug information. The error is db
> not found, but I know its available. Does anyone know how can I
> trouble shoot this? Do we just have to many databases replicating for
> couchdb to handle? 4000 is a small number for the massive hardware
> these are running on.
>
> -Chris
>

Re: CouchDB Crash report db_not_found when attempting to replicate databases

Posted by Chris Stockton <ch...@gmail.com>.

Hello,

We now have about 150 dbs that are refusing to replicate with random
crashes, which provide really zero debug information. The error is db
not found, but I know its available. Does anyone know how can I
trouble shoot this? Do we just have to many databases replicating for
couchdb to handle? 4000 is a small number for the massive hardware
these are running on.

-Chris