You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Chris Stockton <ch...@gmail.com> on 2010/08/03 02:49:20 UTC

CouchDB Replication Failure - odd db_not_found errors

I am having a strange issue that I am stuck and unsure how to further
debug. So, I have 3 machines, we will call them host001, host002,
host003. To prevent this from getting confusing I will put some key
points in a small bullet point list.

  - host002 MAY push to host003
  - host003 MAY NOT pull from host002
    - This is a problem on ALL databases, not just a single one.
  - host003 MAY curl the url just fine, i.e.:
    curl http://user:pass@host002:5984/db_3294/
    {"db_name": ...SNIP... disk_format_version":5}

  - host002 all data lives here
  - host003 has a clean empty copy of the database, created manually
  - host003 the replication calls are continuious, with create target = true


BELOW IS THE ERRORS IN LOGS:
----------------------------------------------------

[debug] [<0.133.0>] DB at http://user:pass@host002:5984/db_3294/ could
not be found because {error,

                                             req_timedout}
[error] [<0.133.0>] {error_report,<0.33.0>,
    {<0.133.0>,crash_report,
     [[{pid,<0.133.0>},
       {registered_name,[]},
       {error_info,
           {exit,
               {db_not_found,
                   <<"http://user:pass@host002:5984/db_3294/">>},
               [{gen_server,init_it,6},{proc_lib,init_p_do_apply,3}]}},
       {initial_call,{couch_rep,init,['Argument__1']}},
       {ancestors,
           [couch_rep_sup,couch_primary_services,couch_server_sup,<0.34.0>]},
       {messages,[]},
       {links,[<0.83.0>]},
       {dictionary,[]},
       {trap_exit,true},
       {status,running},
       {heap_size,1597},
       {stack_size,23},
       {reductions,670}],
      []]}}

=CRASH REPORT==== 2-Aug-2010::17:40:09 ===
  crasher:
    pid: <0.133.0>
    registered_name: []
    exception exit: {db_not_found,<<"http://user:pass@host002:5984/db_3294/">>}
      in function  gen_server:init_it/6
    initial call: couch_rep:init/1
    ancestors: [couch_rep_sup,couch_primary_services,couch_server_sup,
                  <0.34.0>]
    messages: []
    links: [<0.83.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 1597
    stack_size: 23
    reductions: 670
  neighbours:
[info] [<0.98.0>] 172.16.38.131 - - 'POST' /_replicate 404
[debug] [<0.98.0>] httpd 404 error response:
 {"error":"db_not_found","reason":"could not open
http://user:pass@host002:5984/db_3294/"}

Re: CouchDB Replication Failure - odd db_not_found errors

Posted by Chris Stockton <ch...@gmail.com>.

Hello,

On Wed, Aug 4, 2010 at 12:27 PM, Randall Leeds <ra...@gmail.com> wrote:
> It may be that you just had too many replications spun up at once for
> your configuration settings.
> Have you read through
> http://wiki.apache.org/couchdb/Performance#Resource_Limits ?
>

Yes, we tweaked those limits and then some for couchdb to achieve our
replication concurrency requirements. We are perfectly capable of
having 5000 or more replication jobs running, we just need to be
careful how fast we spin them up, at least that is my observation.

-Chris

Re: CouchDB Replication Failure - odd db_not_found errors

Posted by Randall Leeds <ra...@gmail.com>.

It may be that you just had too many replications spun up at once for
your configuration settings.
Have you read through
http://wiki.apache.org/couchdb/Performance#Resource_Limits ?

On Wed, Aug 4, 2010 at 10:38, Chris Stockton <ch...@gmail.com> wrote:
> Hello,
>
> On Mon, Aug 2, 2010 at 5:49 PM, Chris Stockton
> <ch...@gmail.com> wrote:
>> I am having a strange issue that I am stuck and unsure how to further
>
> For anyone who gets similar errors, I was unable to find root cause
> for this. It seemed to happen when I fired up our replicator, which
> spins up about 4 thousand replication jobs. My only thought is it was
> over-loaded. I ended deleting all dbs, reinstalling couchdb, deleting
> all lib files etc. Basically a fresh couchdb build. I then replicated
> the databases one by one. I spun up the replicator, which basically
> does status checks and makes sure continuous replication is running on
> all machines before I went home last night and despite hours of errors
> they dwindled down until all databases were replicating correctly.
>
> Moral to story, if replication isn't working with similar errors, fix
> might be to wipe all databases, then single pass replicate (or rsync)
> all databases before firing up continuous.
>
> -Chris
>

Re: CouchDB Replication Failure - odd db_not_found errors

Posted by Chris Stockton <ch...@gmail.com>.

Hello,

On Mon, Aug 2, 2010 at 5:49 PM, Chris Stockton
<ch...@gmail.com> wrote:
> I am having a strange issue that I am stuck and unsure how to further

For anyone who gets similar errors, I was unable to find root cause
for this. It seemed to happen when I fired up our replicator, which
spins up about 4 thousand replication jobs. My only thought is it was
over-loaded. I ended deleting all dbs, reinstalling couchdb, deleting
all lib files etc. Basically a fresh couchdb build. I then replicated
the databases one by one. I spun up the replicator, which basically
does status checks and makes sure continuous replication is running on
all machines before I went home last night and despite hours of errors
they dwindled down until all databases were replicating correctly.

Moral to story, if replication isn't working with similar errors, fix
might be to wipe all databases, then single pass replicate (or rsync)
all databases before firing up continuous.

-Chris