You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Alan Malta <al...@gmail.com> on 2022/02/26 14:28:59 UTC

Understanding replication documents

Hi everyone,

after a delay of many years to migrate to (almost) the latest CouchDB
version, I started working with CouchDB 3.1.2.

My tests with replication to/from the same node/localhost have been
successful. But now that I am trying multiple push/pull replications with a
remote host, they get into a "failed" state.

I just learned about the "_scheduler/jobs" API - and I am likely missing
some crucial knowledge here - and when I compare it against the documents
in the "_replicator" database, I see an inconsistent definition for either
the source or the target database.
For instance, the "_scheduler/jobs" gives me the following output for one
of the replications:

{"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"http://admin:
*****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
while the "_replicator" db lists this document as:

{"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"http://admin:
*****@localhost:5984/my_db_name","target":"
https://alanblah.blah.blah/couchdb/wmstats
","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
in short, the "target" parameter is defined as null in the "jobs" output.
Is it because the replication failed somehow?

Just in case, this is the only error I see in the couch log regarding that
replication - on the node that triggered the replication:

[error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0> --------
Error processing replication doc `87463eb82b3e1dcd7a3178276800026e` from
`shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}

I also wonder if the replication protocol is compatible among different
releases of CouchDB? In my case, target is still on the super old version
1.6.1 while source is on 3.1.2

Thank you very much for any help that you can provide.
Best,
Alan.

Re: Understanding replication documents

Posted by Nick Vatamaniuc <va...@gmail.com>.
I am glad you found and solved the issue, Alan.

Cheers,
-Nick


On Wed, Mar 16, 2022 at 11:41 AM Alan Malta <al...@gmail.com> wrote:
>
> Hi Nick, all,
>
> After further investigation, I actually found that the problem was on my
> side.
> I had to port a CouchDB patch from 1.6 to 3.1 and there was an oversight
> from
> my side on the method name used to read a configuration parameter (now
> called
> "config" instead of "couch_config").
>
> That was fixed, CouchDB rebuilt and now I can properly replicate data in
> both
> directions.
>
> This brought me to a new - authorization - error though, with undefined
> user GETing
> data. It likely deserves another thread though.
>
> Thanks,
> Alan.
>
> On Mon, Feb 28, 2022 at 4:11 PM Alan Malta <al...@gmail.com> wrote:
>
> > Hi Nick,
> >
> > Thank you for the follow-up investigation and questions.
> >
> > I am in the process of rebuilding my software stack and will try to
> > replicate data using the very same CouchDB 3.1.2 version + Erlang 22.
> >
> > > ... If you get a chance to find and run a remsh script check the output
> > of :
> > Regarding these erlang comments that you suggested to run, this is the
> > output on the Couchdb 3.1 + erlang 22:
> > Eshell V10.7.2.11  (abort with ^G)
> > 1> crypto:info_lib().
> > [{<<"OpenSSL">>,268443839,
> >   <<"OpenSSL 1.0.2k-fips  26 Jan 2017">>}]
> > 2> ssl:versions().
> > [{ssl_app,"9.2"},
> >  {supported,['tlsv1.2']},
> >  {supported_dtls,['dtlsv1.2']},
> >  {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
> >  {available_dtls,['dtlsv1.2',dtlsv1]}]
> >
> > while the "old" and fully functional Couchdb 1.6.1 + erlang 16 gives me
> > this:
> > Eshell V5.10.4  (abort with ^G)
> > 3> ssl:versions().
> > [{ssl_app,"5.3.3"},
> >  {supported,['tlsv1.2','tlsv1.1',tlsv1,sslv3]},
> >  {available,['tlsv1.2','tlsv1.1',tlsv1,sslv3]}]
> > 4> crypto:info_lib().
> > [{<<"OpenSSL">>,268443839,
> >   <<"OpenSSL 1.0.2k-fips  26 Jan 2017">>}]
> >
> > >  * Another potential issue is that the curl script quotes the parameters
> > with a single quote in:
> >
> > this shouldn't be a problem. I actually started running those commands
> > without the env variable, but
> > decided to update my notes when I was copying them over gist.
> >
> > > The logs don't show any stack traces besides the one you indicated in
> > the initial email? Anything with a module name and a line number
> >
> > there is absolute nothing around that single line of error. Changing log
> > level to debug doesn't help either.
> >
> > Thank you for these suggestions. I should be back later with some news on
> > 3.1.2 <--> 3.1.2 bidirectional
> > replication.
> >
> > Thanks,
> > Alan.
> >
> > On Sun, Feb 27, 2022 at 4:40 PM Nick Vatamaniuc <va...@gmail.com>
> > wrote:
> >
> >> Thanks for the script, Alan.
> >>
> >> I had tried to set up a basic replication between localhost endpoint
> >> on Erlang 22 with 3.2.1 release and that seems to work:
> >> https://gist.github.com/nickva/5a89198c62fdd3ec97693c87833d5738
> >>
> >> Looking at the differences between our setups I noticed a few things:
> >>
> >>  * I haven't tried TLS on the endpoints. Wonder if that's the cause.
> >> Would you be able to try it locally (or via a VPN) without TLS.
> >> Sometimes it's possible to install or build an Erlang release without
> >> crypto support and it only manifests itself when trying to use any of
> >> that functionality at runtime. If you get a chance to find and run a
> >> remsh script check the output of :
> >>
> >> crypto:info_lib().
> >> [{<<"OpenSSL">>,269488239,
> >>   <<"OpenSSL 1.1.1f  31 Mar 2020">>}]
> >>
> >> ssl:versions().
> >> [{ssl_app,"9.2"},
> >>  {supported,['tlsv1.2']},
> >>  {supported_dtls,['dtlsv1.2']},
> >>  {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
> >>  {available_dtls,['dtlsv1.2',dtlsv1]}]
> >>
> >> That would indicate you have the crypto application installed and
> >> linked to the openssl library.
> >>
> >>  * Another potential issue is that the curl script quotes the
> >> parameters with a single quote in:
> >>
> >> curl -X POST http://$USERPASS@localhost:5984/_replicator -d
> >> '{"source":"http://$USERPASS@localhost:5984/workqueue_inbox/",
> >> "target":"https://$REMOTEHOST/couchdb/workqueue/", "continuous":true}'
> >> -H "Content-Type: application/json"
> >>
> >> That would make the target the literal
> >> `https://$REMOTEHOST/couchdb/workqueue/` string without substituting
> >> the $REMOTEHOST with its value. That's probably not the reason here
> >> but thought I'd mention it just in case.
> >>
> >>   * `https://$REMOTEHOST/couchdb/workqueue`. I could see the db / url
> >> parser being confused by the url path there as the path
> >> $REMOTEHOST/couchdb/workqueue could be split up as a database
> >> path=$REMOTEHOST/couchdb and then workqueue would be the document, but
> >> in this case the workqueue is the database actually. Would you be able
> >> to test a setup where the URL path looks like
> >> http://domain.name.ext/dbname for an endpoint?
> >>
> >> The logs don't show any stack traces besides the one you indicated in
> >> the initial email? Anything with a module name and a line number
> >> perhaps.
> >>
> >> Thanks,
> >> -Nick
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Feb 26, 2022 at 4:24 PM Alan Malta <al...@gmail.com> wrote:
> >> >
> >> > Hi Nick,
> >> >
> >> > Thank you for your prompt response.
> >> >
> >> > Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that
> >> user
> >> > and password only have basic chars a-z.
> >> >
> >> > I wiped out all my setup, started from scratch and managed to reproduce
> >> > this replication issue with the following set
> >> > of commands:
> >> > https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0
> >> >
> >> > while reproducing it, I defined only one way replication. However, my
> >> > previous setup had it bi-directional and both
> >> > of them were in a failed state. I also added some extra checks and
> >> > information in the gist above, in case it turns out
> >> > to be helpful.
> >> >
> >> > I haven't yet tried to replicate data among two instances running the
> >> same
> >> > version. Reason is, during this migration,
> >> > I believe it will be impossible to swap all my services to the new
> >> CouchDB
> >> > version, so there should be a period of
> >> > time (around a month) where I will need to keep this hybrid setup.
> >> >
> >> > Thank you again!
> >> > Alan.
> >> >
> >> > On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <va...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Alan,
> >> > >
> >> > > Thanks for reaching out.
> >> > >
> >> > > It looks like CouchDB had failed to parse the replication document,
> >> > > and couldn't turn it into a proper replication job.
> >> > >
> >> > > The 'undef' error could suggest running on an unsupported version of
> >> > > Erlang. It's a generic "this function doesn't exist" error in Erlang.
> >> > > Are you running on at least Erlang 20?
> >> > >
> >> > > Does the target url have any unusual characters in it, or something
> >> > > that might cause parsing errors (say, ':' or '@' characters for
> >> > > example).
> >> > >
> >> > > Would it be possible to have an example script which fails. Ideally, a
> >> > > set of curl commands creating dbs, then the replication job using
> >> > > similar parameters you had?
> >> > >
> >> > > Cheers,
> >> > > -Nick
> >> > >
> >> > > On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <al...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > Hi everyone,
> >> > > >
> >> > > > after a delay of many years to migrate to (almost) the latest
> >> CouchDB
> >> > > > version, I started working with CouchDB 3.1.2.
> >> > > >
> >> > > > My tests with replication to/from the same node/localhost have been
> >> > > > successful. But now that I am trying multiple push/pull replications
> >> > > with a
> >> > > > remote host, they get into a "failed" state.
> >> > > >
> >> > > > I just learned about the "_scheduler/jobs" API - and I am likely
> >> missing
> >> > > > some crucial knowledge here - and when I compare it against the
> >> documents
> >> > > > in the "_replicator" database, I see an inconsistent definition for
> >> > > either
> >> > > > the source or the target database.
> >> > > > For instance, the "_scheduler/jobs" gives me the following output
> >> for one
> >> > > > of the replications:
> >> > > >
> >> > > >
> >> > >
> >> {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"
> >> > > http://admin:
> >> > > >
> >> > >
> >> *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
> >> > > > while the "_replicator" db lists this document as:
> >> > > >
> >> > > >
> >> > >
> >> {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"
> >> > > http://admin:
> >> > > > *****@localhost:5984/my_db_name","target":"
> >> > > > https://alanblah.blah.blah/couchdb/wmstats
> >> > > >
> >> > >
> >> ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
> >> > > > in short, the "target" parameter is defined as null in the "jobs"
> >> output.
> >> > > > Is it because the replication failed somehow?
> >> > > >
> >> > > > Just in case, this is the only error I see in the couch log
> >> regarding
> >> > > that
> >> > > > replication - on the node that triggered the replication:
> >> > > >
> >> > > > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0>
> >> --------
> >> > > > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e`
> >> from
> >> > > > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
> >> > > >
> >> > > > I also wonder if the replication protocol is compatible among
> >> different
> >> > > > releases of CouchDB? In my case, target is still on the super old
> >> version
> >> > > > 1.6.1 while source is on 3.1.2
> >> > > >
> >> > > > Thank you very much for any help that you can provide.
> >> > > > Best,
> >> > > > Alan.
> >> > >
> >>
> >

Re: Understanding replication documents

Posted by Alan Malta <al...@gmail.com>.
Hi Nick, all,

After further investigation, I actually found that the problem was on my
side.
I had to port a CouchDB patch from 1.6 to 3.1 and there was an oversight
from
my side on the method name used to read a configuration parameter (now
called
"config" instead of "couch_config").

That was fixed, CouchDB rebuilt and now I can properly replicate data in
both
directions.

This brought me to a new - authorization - error though, with undefined
user GETing
data. It likely deserves another thread though.

Thanks,
Alan.

On Mon, Feb 28, 2022 at 4:11 PM Alan Malta <al...@gmail.com> wrote:

> Hi Nick,
>
> Thank you for the follow-up investigation and questions.
>
> I am in the process of rebuilding my software stack and will try to
> replicate data using the very same CouchDB 3.1.2 version + Erlang 22.
>
> > ... If you get a chance to find and run a remsh script check the output
> of :
> Regarding these erlang comments that you suggested to run, this is the
> output on the Couchdb 3.1 + erlang 22:
> Eshell V10.7.2.11  (abort with ^G)
> 1> crypto:info_lib().
> [{<<"OpenSSL">>,268443839,
>   <<"OpenSSL 1.0.2k-fips  26 Jan 2017">>}]
> 2> ssl:versions().
> [{ssl_app,"9.2"},
>  {supported,['tlsv1.2']},
>  {supported_dtls,['dtlsv1.2']},
>  {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
>  {available_dtls,['dtlsv1.2',dtlsv1]}]
>
> while the "old" and fully functional Couchdb 1.6.1 + erlang 16 gives me
> this:
> Eshell V5.10.4  (abort with ^G)
> 3> ssl:versions().
> [{ssl_app,"5.3.3"},
>  {supported,['tlsv1.2','tlsv1.1',tlsv1,sslv3]},
>  {available,['tlsv1.2','tlsv1.1',tlsv1,sslv3]}]
> 4> crypto:info_lib().
> [{<<"OpenSSL">>,268443839,
>   <<"OpenSSL 1.0.2k-fips  26 Jan 2017">>}]
>
> >  * Another potential issue is that the curl script quotes the parameters
> with a single quote in:
>
> this shouldn't be a problem. I actually started running those commands
> without the env variable, but
> decided to update my notes when I was copying them over gist.
>
> > The logs don't show any stack traces besides the one you indicated in
> the initial email? Anything with a module name and a line number
>
> there is absolute nothing around that single line of error. Changing log
> level to debug doesn't help either.
>
> Thank you for these suggestions. I should be back later with some news on
> 3.1.2 <--> 3.1.2 bidirectional
> replication.
>
> Thanks,
> Alan.
>
> On Sun, Feb 27, 2022 at 4:40 PM Nick Vatamaniuc <va...@gmail.com>
> wrote:
>
>> Thanks for the script, Alan.
>>
>> I had tried to set up a basic replication between localhost endpoint
>> on Erlang 22 with 3.2.1 release and that seems to work:
>> https://gist.github.com/nickva/5a89198c62fdd3ec97693c87833d5738
>>
>> Looking at the differences between our setups I noticed a few things:
>>
>>  * I haven't tried TLS on the endpoints. Wonder if that's the cause.
>> Would you be able to try it locally (or via a VPN) without TLS.
>> Sometimes it's possible to install or build an Erlang release without
>> crypto support and it only manifests itself when trying to use any of
>> that functionality at runtime. If you get a chance to find and run a
>> remsh script check the output of :
>>
>> crypto:info_lib().
>> [{<<"OpenSSL">>,269488239,
>>   <<"OpenSSL 1.1.1f  31 Mar 2020">>}]
>>
>> ssl:versions().
>> [{ssl_app,"9.2"},
>>  {supported,['tlsv1.2']},
>>  {supported_dtls,['dtlsv1.2']},
>>  {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
>>  {available_dtls,['dtlsv1.2',dtlsv1]}]
>>
>> That would indicate you have the crypto application installed and
>> linked to the openssl library.
>>
>>  * Another potential issue is that the curl script quotes the
>> parameters with a single quote in:
>>
>> curl -X POST http://$USERPASS@localhost:5984/_replicator -d
>> '{"source":"http://$USERPASS@localhost:5984/workqueue_inbox/",
>> "target":"https://$REMOTEHOST/couchdb/workqueue/", "continuous":true}'
>> -H "Content-Type: application/json"
>>
>> That would make the target the literal
>> `https://$REMOTEHOST/couchdb/workqueue/` string without substituting
>> the $REMOTEHOST with its value. That's probably not the reason here
>> but thought I'd mention it just in case.
>>
>>   * `https://$REMOTEHOST/couchdb/workqueue`. I could see the db / url
>> parser being confused by the url path there as the path
>> $REMOTEHOST/couchdb/workqueue could be split up as a database
>> path=$REMOTEHOST/couchdb and then workqueue would be the document, but
>> in this case the workqueue is the database actually. Would you be able
>> to test a setup where the URL path looks like
>> http://domain.name.ext/dbname for an endpoint?
>>
>> The logs don't show any stack traces besides the one you indicated in
>> the initial email? Anything with a module name and a line number
>> perhaps.
>>
>> Thanks,
>> -Nick
>>
>>
>>
>>
>>
>> On Sat, Feb 26, 2022 at 4:24 PM Alan Malta <al...@gmail.com> wrote:
>> >
>> > Hi Nick,
>> >
>> > Thank you for your prompt response.
>> >
>> > Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that
>> user
>> > and password only have basic chars a-z.
>> >
>> > I wiped out all my setup, started from scratch and managed to reproduce
>> > this replication issue with the following set
>> > of commands:
>> > https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0
>> >
>> > while reproducing it, I defined only one way replication. However, my
>> > previous setup had it bi-directional and both
>> > of them were in a failed state. I also added some extra checks and
>> > information in the gist above, in case it turns out
>> > to be helpful.
>> >
>> > I haven't yet tried to replicate data among two instances running the
>> same
>> > version. Reason is, during this migration,
>> > I believe it will be impossible to swap all my services to the new
>> CouchDB
>> > version, so there should be a period of
>> > time (around a month) where I will need to keep this hybrid setup.
>> >
>> > Thank you again!
>> > Alan.
>> >
>> > On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <va...@gmail.com>
>> wrote:
>> >
>> > > Hi Alan,
>> > >
>> > > Thanks for reaching out.
>> > >
>> > > It looks like CouchDB had failed to parse the replication document,
>> > > and couldn't turn it into a proper replication job.
>> > >
>> > > The 'undef' error could suggest running on an unsupported version of
>> > > Erlang. It's a generic "this function doesn't exist" error in Erlang.
>> > > Are you running on at least Erlang 20?
>> > >
>> > > Does the target url have any unusual characters in it, or something
>> > > that might cause parsing errors (say, ':' or '@' characters for
>> > > example).
>> > >
>> > > Would it be possible to have an example script which fails. Ideally, a
>> > > set of curl commands creating dbs, then the replication job using
>> > > similar parameters you had?
>> > >
>> > > Cheers,
>> > > -Nick
>> > >
>> > > On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <al...@gmail.com>
>> wrote:
>> > > >
>> > > > Hi everyone,
>> > > >
>> > > > after a delay of many years to migrate to (almost) the latest
>> CouchDB
>> > > > version, I started working with CouchDB 3.1.2.
>> > > >
>> > > > My tests with replication to/from the same node/localhost have been
>> > > > successful. But now that I am trying multiple push/pull replications
>> > > with a
>> > > > remote host, they get into a "failed" state.
>> > > >
>> > > > I just learned about the "_scheduler/jobs" API - and I am likely
>> missing
>> > > > some crucial knowledge here - and when I compare it against the
>> documents
>> > > > in the "_replicator" database, I see an inconsistent definition for
>> > > either
>> > > > the source or the target database.
>> > > > For instance, the "_scheduler/jobs" gives me the following output
>> for one
>> > > > of the replications:
>> > > >
>> > > >
>> > >
>> {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"
>> > > http://admin:
>> > > >
>> > >
>> *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
>> > > > while the "_replicator" db lists this document as:
>> > > >
>> > > >
>> > >
>> {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"
>> > > http://admin:
>> > > > *****@localhost:5984/my_db_name","target":"
>> > > > https://alanblah.blah.blah/couchdb/wmstats
>> > > >
>> > >
>> ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
>> > > > in short, the "target" parameter is defined as null in the "jobs"
>> output.
>> > > > Is it because the replication failed somehow?
>> > > >
>> > > > Just in case, this is the only error I see in the couch log
>> regarding
>> > > that
>> > > > replication - on the node that triggered the replication:
>> > > >
>> > > > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0>
>> --------
>> > > > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e`
>> from
>> > > > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
>> > > >
>> > > > I also wonder if the replication protocol is compatible among
>> different
>> > > > releases of CouchDB? In my case, target is still on the super old
>> version
>> > > > 1.6.1 while source is on 3.1.2
>> > > >
>> > > > Thank you very much for any help that you can provide.
>> > > > Best,
>> > > > Alan.
>> > >
>>
>

Re: Understanding replication documents

Posted by Alan Malta <al...@gmail.com>.
Hi Nick,

Thank you for the follow-up investigation and questions.

I am in the process of rebuilding my software stack and will try to
replicate data using the very same CouchDB 3.1.2 version + Erlang 22.

> ... If you get a chance to find and run a remsh script check the output
of :
Regarding these erlang comments that you suggested to run, this is the
output on the Couchdb 3.1 + erlang 22:
Eshell V10.7.2.11  (abort with ^G)
1> crypto:info_lib().
[{<<"OpenSSL">>,268443839,
  <<"OpenSSL 1.0.2k-fips  26 Jan 2017">>}]
2> ssl:versions().
[{ssl_app,"9.2"},
 {supported,['tlsv1.2']},
 {supported_dtls,['dtlsv1.2']},
 {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
 {available_dtls,['dtlsv1.2',dtlsv1]}]

while the "old" and fully functional Couchdb 1.6.1 + erlang 16 gives me
this:
Eshell V5.10.4  (abort with ^G)
3> ssl:versions().
[{ssl_app,"5.3.3"},
 {supported,['tlsv1.2','tlsv1.1',tlsv1,sslv3]},
 {available,['tlsv1.2','tlsv1.1',tlsv1,sslv3]}]
4> crypto:info_lib().
[{<<"OpenSSL">>,268443839,
  <<"OpenSSL 1.0.2k-fips  26 Jan 2017">>}]

>  * Another potential issue is that the curl script quotes the parameters
with a single quote in:

this shouldn't be a problem. I actually started running those commands
without the env variable, but
decided to update my notes when I was copying them over gist.

> The logs don't show any stack traces besides the one you indicated in
the initial email? Anything with a module name and a line number

there is absolute nothing around that single line of error. Changing log
level to debug doesn't help either.

Thank you for these suggestions. I should be back later with some news on
3.1.2 <--> 3.1.2 bidirectional
replication.

Thanks,
Alan.

On Sun, Feb 27, 2022 at 4:40 PM Nick Vatamaniuc <va...@gmail.com> wrote:

> Thanks for the script, Alan.
>
> I had tried to set up a basic replication between localhost endpoint
> on Erlang 22 with 3.2.1 release and that seems to work:
> https://gist.github.com/nickva/5a89198c62fdd3ec97693c87833d5738
>
> Looking at the differences between our setups I noticed a few things:
>
>  * I haven't tried TLS on the endpoints. Wonder if that's the cause.
> Would you be able to try it locally (or via a VPN) without TLS.
> Sometimes it's possible to install or build an Erlang release without
> crypto support and it only manifests itself when trying to use any of
> that functionality at runtime. If you get a chance to find and run a
> remsh script check the output of :
>
> crypto:info_lib().
> [{<<"OpenSSL">>,269488239,
>   <<"OpenSSL 1.1.1f  31 Mar 2020">>}]
>
> ssl:versions().
> [{ssl_app,"9.2"},
>  {supported,['tlsv1.2']},
>  {supported_dtls,['dtlsv1.2']},
>  {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
>  {available_dtls,['dtlsv1.2',dtlsv1]}]
>
> That would indicate you have the crypto application installed and
> linked to the openssl library.
>
>  * Another potential issue is that the curl script quotes the
> parameters with a single quote in:
>
> curl -X POST http://$USERPASS@localhost:5984/_replicator -d
> '{"source":"http://$USERPASS@localhost:5984/workqueue_inbox/",
> "target":"https://$REMOTEHOST/couchdb/workqueue/", "continuous":true}'
> -H "Content-Type: application/json"
>
> That would make the target the literal
> `https://$REMOTEHOST/couchdb/workqueue/` string without substituting
> the $REMOTEHOST with its value. That's probably not the reason here
> but thought I'd mention it just in case.
>
>   * `https://$REMOTEHOST/couchdb/workqueue`. I could see the db / url
> parser being confused by the url path there as the path
> $REMOTEHOST/couchdb/workqueue could be split up as a database
> path=$REMOTEHOST/couchdb and then workqueue would be the document, but
> in this case the workqueue is the database actually. Would you be able
> to test a setup where the URL path looks like
> http://domain.name.ext/dbname for an endpoint?
>
> The logs don't show any stack traces besides the one you indicated in
> the initial email? Anything with a module name and a line number
> perhaps.
>
> Thanks,
> -Nick
>
>
>
>
>
> On Sat, Feb 26, 2022 at 4:24 PM Alan Malta <al...@gmail.com> wrote:
> >
> > Hi Nick,
> >
> > Thank you for your prompt response.
> >
> > Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that
> user
> > and password only have basic chars a-z.
> >
> > I wiped out all my setup, started from scratch and managed to reproduce
> > this replication issue with the following set
> > of commands:
> > https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0
> >
> > while reproducing it, I defined only one way replication. However, my
> > previous setup had it bi-directional and both
> > of them were in a failed state. I also added some extra checks and
> > information in the gist above, in case it turns out
> > to be helpful.
> >
> > I haven't yet tried to replicate data among two instances running the
> same
> > version. Reason is, during this migration,
> > I believe it will be impossible to swap all my services to the new
> CouchDB
> > version, so there should be a period of
> > time (around a month) where I will need to keep this hybrid setup.
> >
> > Thank you again!
> > Alan.
> >
> > On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <va...@gmail.com>
> wrote:
> >
> > > Hi Alan,
> > >
> > > Thanks for reaching out.
> > >
> > > It looks like CouchDB had failed to parse the replication document,
> > > and couldn't turn it into a proper replication job.
> > >
> > > The 'undef' error could suggest running on an unsupported version of
> > > Erlang. It's a generic "this function doesn't exist" error in Erlang.
> > > Are you running on at least Erlang 20?
> > >
> > > Does the target url have any unusual characters in it, or something
> > > that might cause parsing errors (say, ':' or '@' characters for
> > > example).
> > >
> > > Would it be possible to have an example script which fails. Ideally, a
> > > set of curl commands creating dbs, then the replication job using
> > > similar parameters you had?
> > >
> > > Cheers,
> > > -Nick
> > >
> > > On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <al...@gmail.com>
> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > after a delay of many years to migrate to (almost) the latest CouchDB
> > > > version, I started working with CouchDB 3.1.2.
> > > >
> > > > My tests with replication to/from the same node/localhost have been
> > > > successful. But now that I am trying multiple push/pull replications
> > > with a
> > > > remote host, they get into a "failed" state.
> > > >
> > > > I just learned about the "_scheduler/jobs" API - and I am likely
> missing
> > > > some crucial knowledge here - and when I compare it against the
> documents
> > > > in the "_replicator" database, I see an inconsistent definition for
> > > either
> > > > the source or the target database.
> > > > For instance, the "_scheduler/jobs" gives me the following output
> for one
> > > > of the replications:
> > > >
> > > >
> > >
> {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"
> > > http://admin:
> > > >
> > >
> *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
> > > > while the "_replicator" db lists this document as:
> > > >
> > > >
> > >
> {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"
> > > http://admin:
> > > > *****@localhost:5984/my_db_name","target":"
> > > > https://alanblah.blah.blah/couchdb/wmstats
> > > >
> > >
> ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
> > > > in short, the "target" parameter is defined as null in the "jobs"
> output.
> > > > Is it because the replication failed somehow?
> > > >
> > > > Just in case, this is the only error I see in the couch log regarding
> > > that
> > > > replication - on the node that triggered the replication:
> > > >
> > > > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0>
> --------
> > > > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e`
> from
> > > > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
> > > >
> > > > I also wonder if the replication protocol is compatible among
> different
> > > > releases of CouchDB? In my case, target is still on the super old
> version
> > > > 1.6.1 while source is on 3.1.2
> > > >
> > > > Thank you very much for any help that you can provide.
> > > > Best,
> > > > Alan.
> > >
>

Re: Understanding replication documents

Posted by Nick Vatamaniuc <va...@gmail.com>.
Thanks for the script, Alan.

I had tried to set up a basic replication between localhost endpoint
on Erlang 22 with 3.2.1 release and that seems to work:
https://gist.github.com/nickva/5a89198c62fdd3ec97693c87833d5738

Looking at the differences between our setups I noticed a few things:

 * I haven't tried TLS on the endpoints. Wonder if that's the cause.
Would you be able to try it locally (or via a VPN) without TLS.
Sometimes it's possible to install or build an Erlang release without
crypto support and it only manifests itself when trying to use any of
that functionality at runtime. If you get a chance to find and run a
remsh script check the output of :

crypto:info_lib().
[{<<"OpenSSL">>,269488239,
  <<"OpenSSL 1.1.1f  31 Mar 2020">>}]

ssl:versions().
[{ssl_app,"9.2"},
 {supported,['tlsv1.2']},
 {supported_dtls,['dtlsv1.2']},
 {available,['tlsv1.3','tlsv1.2','tlsv1.1',tlsv1,sslv3]},
 {available_dtls,['dtlsv1.2',dtlsv1]}]

That would indicate you have the crypto application installed and
linked to the openssl library.

 * Another potential issue is that the curl script quotes the
parameters with a single quote in:

curl -X POST http://$USERPASS@localhost:5984/_replicator -d
'{"source":"http://$USERPASS@localhost:5984/workqueue_inbox/",
"target":"https://$REMOTEHOST/couchdb/workqueue/", "continuous":true}'
-H "Content-Type: application/json"

That would make the target the literal
`https://$REMOTEHOST/couchdb/workqueue/` string without substituting
the $REMOTEHOST with its value. That's probably not the reason here
but thought I'd mention it just in case.

  * `https://$REMOTEHOST/couchdb/workqueue`. I could see the db / url
parser being confused by the url path there as the path
$REMOTEHOST/couchdb/workqueue could be split up as a database
path=$REMOTEHOST/couchdb and then workqueue would be the document, but
in this case the workqueue is the database actually. Would you be able
to test a setup where the URL path looks like
http://domain.name.ext/dbname for an endpoint?

The logs don't show any stack traces besides the one you indicated in
the initial email? Anything with a module name and a line number
perhaps.

Thanks,
-Nick





On Sat, Feb 26, 2022 at 4:24 PM Alan Malta <al...@gmail.com> wrote:
>
> Hi Nick,
>
> Thank you for your prompt response.
>
> Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that user
> and password only have basic chars a-z.
>
> I wiped out all my setup, started from scratch and managed to reproduce
> this replication issue with the following set
> of commands:
> https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0
>
> while reproducing it, I defined only one way replication. However, my
> previous setup had it bi-directional and both
> of them were in a failed state. I also added some extra checks and
> information in the gist above, in case it turns out
> to be helpful.
>
> I haven't yet tried to replicate data among two instances running the same
> version. Reason is, during this migration,
> I believe it will be impossible to swap all my services to the new CouchDB
> version, so there should be a period of
> time (around a month) where I will need to keep this hybrid setup.
>
> Thank you again!
> Alan.
>
> On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <va...@gmail.com> wrote:
>
> > Hi Alan,
> >
> > Thanks for reaching out.
> >
> > It looks like CouchDB had failed to parse the replication document,
> > and couldn't turn it into a proper replication job.
> >
> > The 'undef' error could suggest running on an unsupported version of
> > Erlang. It's a generic "this function doesn't exist" error in Erlang.
> > Are you running on at least Erlang 20?
> >
> > Does the target url have any unusual characters in it, or something
> > that might cause parsing errors (say, ':' or '@' characters for
> > example).
> >
> > Would it be possible to have an example script which fails. Ideally, a
> > set of curl commands creating dbs, then the replication job using
> > similar parameters you had?
> >
> > Cheers,
> > -Nick
> >
> > On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <al...@gmail.com> wrote:
> > >
> > > Hi everyone,
> > >
> > > after a delay of many years to migrate to (almost) the latest CouchDB
> > > version, I started working with CouchDB 3.1.2.
> > >
> > > My tests with replication to/from the same node/localhost have been
> > > successful. But now that I am trying multiple push/pull replications
> > with a
> > > remote host, they get into a "failed" state.
> > >
> > > I just learned about the "_scheduler/jobs" API - and I am likely missing
> > > some crucial knowledge here - and when I compare it against the documents
> > > in the "_replicator" database, I see an inconsistent definition for
> > either
> > > the source or the target database.
> > > For instance, the "_scheduler/jobs" gives me the following output for one
> > > of the replications:
> > >
> > >
> > {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"
> > http://admin:
> > >
> > *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
> > > while the "_replicator" db lists this document as:
> > >
> > >
> > {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"
> > http://admin:
> > > *****@localhost:5984/my_db_name","target":"
> > > https://alanblah.blah.blah/couchdb/wmstats
> > >
> > ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
> > > in short, the "target" parameter is defined as null in the "jobs" output.
> > > Is it because the replication failed somehow?
> > >
> > > Just in case, this is the only error I see in the couch log regarding
> > that
> > > replication - on the node that triggered the replication:
> > >
> > > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0> --------
> > > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e` from
> > > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
> > >
> > > I also wonder if the replication protocol is compatible among different
> > > releases of CouchDB? In my case, target is still on the super old version
> > > 1.6.1 while source is on 3.1.2
> > >
> > > Thank you very much for any help that you can provide.
> > > Best,
> > > Alan.
> >

Re: Understanding replication documents

Posted by Alan Malta <al...@gmail.com>.
Hi Nick,

Thank you for your prompt response.

Yes, I confirm that CouchDB 3.1.2 is running with Erlang 22; and that user
and password only have basic chars a-z.

I wiped out all my setup, started from scratch and managed to reproduce
this replication issue with the following set
of commands:
https://gist.github.com/amaltaro/67bd133c519300fb82dd0cad372cf1a0

while reproducing it, I defined only one way replication. However, my
previous setup had it bi-directional and both
of them were in a failed state. I also added some extra checks and
information in the gist above, in case it turns out
to be helpful.

I haven't yet tried to replicate data among two instances running the same
version. Reason is, during this migration,
I believe it will be impossible to swap all my services to the new CouchDB
version, so there should be a period of
time (around a month) where I will need to keep this hybrid setup.

Thank you again!
Alan.

On Sat, Feb 26, 2022 at 12:26 PM Nick Vatamaniuc <va...@gmail.com> wrote:

> Hi Alan,
>
> Thanks for reaching out.
>
> It looks like CouchDB had failed to parse the replication document,
> and couldn't turn it into a proper replication job.
>
> The 'undef' error could suggest running on an unsupported version of
> Erlang. It's a generic "this function doesn't exist" error in Erlang.
> Are you running on at least Erlang 20?
>
> Does the target url have any unusual characters in it, or something
> that might cause parsing errors (say, ':' or '@' characters for
> example).
>
> Would it be possible to have an example script which fails. Ideally, a
> set of curl commands creating dbs, then the replication job using
> similar parameters you had?
>
> Cheers,
> -Nick
>
> On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <al...@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > after a delay of many years to migrate to (almost) the latest CouchDB
> > version, I started working with CouchDB 3.1.2.
> >
> > My tests with replication to/from the same node/localhost have been
> > successful. But now that I am trying multiple push/pull replications
> with a
> > remote host, they get into a "failed" state.
> >
> > I just learned about the "_scheduler/jobs" API - and I am likely missing
> > some crucial knowledge here - and when I compare it against the documents
> > in the "_replicator" database, I see an inconsistent definition for
> either
> > the source or the target database.
> > For instance, the "_scheduler/jobs" gives me the following output for one
> > of the replications:
> >
> >
> {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"
> http://admin:
> >
> *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
> > while the "_replicator" db lists this document as:
> >
> >
> {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"
> http://admin:
> > *****@localhost:5984/my_db_name","target":"
> > https://alanblah.blah.blah/couchdb/wmstats
> >
> ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
> > in short, the "target" parameter is defined as null in the "jobs" output.
> > Is it because the replication failed somehow?
> >
> > Just in case, this is the only error I see in the couch log regarding
> that
> > replication - on the node that triggered the replication:
> >
> > [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0> --------
> > Error processing replication doc `87463eb82b3e1dcd7a3178276800026e` from
> > `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
> >
> > I also wonder if the replication protocol is compatible among different
> > releases of CouchDB? In my case, target is still on the super old version
> > 1.6.1 while source is on 3.1.2
> >
> > Thank you very much for any help that you can provide.
> > Best,
> > Alan.
>

Re: Understanding replication documents

Posted by Nick Vatamaniuc <va...@gmail.com>.
Hi Alan,

Thanks for reaching out.

It looks like CouchDB had failed to parse the replication document,
and couldn't turn it into a proper replication job.

The 'undef' error could suggest running on an unsupported version of
Erlang. It's a generic "this function doesn't exist" error in Erlang.
Are you running on at least Erlang 20?

Does the target url have any unusual characters in it, or something
that might cause parsing errors (say, ':' or '@' characters for
example).

Would it be possible to have an example script which fails. Ideally, a
set of curl commands creating dbs, then the replication job using
similar parameters you had?

Cheers,
-Nick

On Sat, Feb 26, 2022 at 9:29 AM Alan Malta <al...@gmail.com> wrote:
>
> Hi everyone,
>
> after a delay of many years to migrate to (almost) the latest CouchDB
> version, I started working with CouchDB 3.1.2.
>
> My tests with replication to/from the same node/localhost have been
> successful. But now that I am trying multiple push/pull replications with a
> remote host, they get into a "failed" state.
>
> I just learned about the "_scheduler/jobs" API - and I am likely missing
> some crucial knowledge here - and when I compare it against the documents
> in the "_replicator" database, I see an inconsistent definition for either
> the source or the target database.
> For instance, the "_scheduler/jobs" gives me the following output for one
> of the replications:
>
> {"database":"_replicator","doc_id":"87463eb82b3e1dcd7a3178276800026e","id":null,"source":"http://admin:
> *****@localhost:5984/my_db_name/","target":null,"state":"failed","error_count":1,"info":{"error":"{error,undef}"},"start_time":"2022-02-26T13:43:42Z","last_updated":"2022-02-26T13:43:42Z"},
> while the "_replicator" db lists this document as:
>
> {"id":"87463eb82b3e1dcd7a3178276800026e","key":"87463eb82b3e1dcd7a3178276800026e","value":{"rev":"2-590d4eadf029c21303ce77116d2f3f92"},"doc":{"_id":"87463eb82b3e1dcd7a3178276800026e","_rev":"2-590d4eadf029c21303ce77116d2f3f92","source":"http://admin:
> *****@localhost:5984/my_db_name","target":"
> https://alanblah.blah.blah/couchdb/wmstats
> ","continuous":true,"filter":"WMStatsAgent/repfilter","owner":"admin","_replication_state":"failed","_replication_state_time":"2022-02-26T13:43:42Z","_replication_state_reason":"{error,undef}"}},
> in short, the "target" parameter is defined as null in the "jobs" output.
> Is it because the replication failed somehow?
>
> Just in case, this is the only error I see in the couch log regarding that
> replication - on the node that triggered the replication:
>
> [error] 2022-02-26T13:43:42.016495Z couchdb@127.0.0.1 <0.534.0> --------
> Error processing replication doc `87463eb82b3e1dcd7a3178276800026e` from
> `shards/00000000-7fffffff/_replicator.1645882154`: {error,undef}
>
> I also wonder if the replication protocol is compatible among different
> releases of CouchDB? In my case, target is still on the super old version
> 1.6.1 while source is on 3.1.2
>
> Thank you very much for any help that you can provide.
> Best,
> Alan.