You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Stefan Kral <st...@emlix.com> on 2023/02/28 09:19:37 UTC

Question about file descriptor swapping

Hi,

I'm experimenting with a CouchDB setup on a SMB mount point. I know this
is not supported, but I ran into a (maybe simple) problem I don't
understand. Maybe someone of you can give a hint easily (that would be
amazing).

Given the following patch (I need to close/reopen the file descriptors
after renaming) for the function
https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176

>   1 --- a/src/couchdb/couch_db_updater.erl
>   2 +++ b/src/couchdb/couch_db_updater.erl
>   3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, #db{filepath=Path}=Db) ->
>   4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>   5          couch_file:delete(RootDir, Filepath),
>   6          ok = file:rename(CompactFilepath, Filepath),
>   7 +
>   8 +        ok = couch_file:close(NewDb#db.updater_fd),
>   9 +        ok = couch_file:close(NewDb#db.fd),
>  10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>  11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>  12 +        SwappedDb = NewDb2#db{
>  13 +            fd = SwappedReaderFd,
>  14 +            updater_fd = SwappedFd
>  15 +        },
>  16 +        unlink(SwappedFd),
>  17          close_db(Db),
>  18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>  19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>  20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>  21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>  22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),

then the gen_server:call() of line 20 never returns.

Is there a major issue with this approach or just a minor mistake in my
implementation?


Thank you for having a look,
Stefan

Re: Question about file descriptor swapping

Posted by Ronny Berndt <ro...@apache.org>.
Hi Stefan,

while we had a discussion at Slack [1] (found by Jan at [2]) about the atomicity of „rename", could it
be a similar problem here (Linux/qemu/fs stack)?

In [2] they could workaround their problem with waiting some time after renaming?

@Stefan, maybe you could try to wait some time after renaming/closing the db?

Cheers,
-Ronny

[1] https://couchdb.slack.com/archives/C01TBE2J197/p1678355980122119
[2] https://toot.cat/@zkat/109973167110793372

> Am 13.03.2023 um 09:50 schrieb Stefan Kral <st...@emlix.com>:
> 
> Hi Jan,
> 
> here you go: https://github.com/emlix/couchdb-yocto
> 
> the mentioned patch is here
> https://github.com/emlix/couchdb-yocto/blob/main/meta-couchdb/recipes-core/couchdb/files/0001-swap-fds.patch
> 
> when you run the comaction test (see README do get there)
> /usr/lib/test-couchdb/test-compaction.sh
> 
> you will find in the (/var/log/couchdb/couch.log) log as last line:
> [debug] [<0.173.0>] before gen_server:call
> 
> Thanks,
> Stefan
> 
> Am 02.03.23 um 13:45 schrieb Jan Lehnardt:
>> Hi Stefan,
>> 
>> Thanks for the additional info. I’m happy to try a yocto build here.
>> 
>> Best
>> Jan
>> —
>> 
>>> On 2. Mar 2023, at 12:24, Stefan Kral <st...@emlix.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I can give you some background context: our CouchDB instance is running
>>> on a embedded device (with minimal attack vector, so we have no pressure
>>> to mitigate CVEs). CouchDB has been chosen because of its write append
>>> and power fail safe property (and because of the easy scriptable
>>> curl/json interface).
>>> 
>>> Currently there is a production system running on a SMB1 share (mounted
>>> in a Linux host) which works well (at least for our uses cases). SMB1 is
>>> not logner the default on the Windows remote side. And SMB2/3 has an
>>> issue with opening a renamend but not closed filedescriptor. The
>>> question is, wether we can solve this issue with minimal changes.
>>> 
>>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>>> 2. Do you get any pertinent lines (especially crashes) in your
>>>>  couch.log?
>>> 
>>> by adding:
>>> 
>>>> +        ?LOG_DEBUG("before gen_server:call", []),
>>>>        ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>>>> +        ?LOG_DEBUG("after gen_server:call", []),
>>> 
>>> the log gives:
>>> 
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process spawned for db "asdf"
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for <0.391.0>: [{changes_done,1},
>>>>                                                  {database,<<"asdf">>},
>>>>                                                  {progress,100},
>>>>                                                  {started_on,1677753384},
>>>>                                                  {total_changes,1},
>>>>                                                  {type,database_compaction},
>>>>                                                  {updated_on,1677753384}]
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files .../asdf.couch and .../asdf.couch.compact.
>>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call
>>> 
>>> then long time nothing...
>>> 
>>> refreshing the db in the futon web gui gives: no response
>>> 
>>> and the log continues with:
>>> 
>>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server couch_compaction_daemon terminating
>>>> ** Last message in was {'EXIT',<0.145.0>,
>>>>                          {timeout,
>>>>                              {gen_server,call,[couch_server,get_server]}}}
>>>> ** When Server state == {state,<0.145.0>}
>>>> ** Reason for termination ==
>>>> ** {compaction_loop_died,
>>>>      {timeout,{gen_server,call,[couch_server,get_server]}}}
>>>> 
>>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>>>>                    {<0.144.0>,crash_report,
>>>>                     [[{initial_call,
>>>>                        {couch_compaction_daemon,init,['Argument__1']}},
>>>>                       {pid,<0.144.0>},
>>>>                       {registered_name,couch_compaction_daemon},
>>>>                       {error_info,
>>>>                        {exit,
>>>>                         {compaction_loop_died,
>>>>                          {timeout,
>>>>                           {gen_server,call,[couch_server,get_server]}}},
>>>>                         [{gen_server,terminate,7,
>>>>                           [{file,"gen_server.erl"},{line,804}]},
>>>>                          {proc_lib,init_p_do_apply,3,
>>>>                           [{file,"proc_lib.erl"},{line,237}]}]}},
>>> ...
>>> 
>>> 
>>>> 3. Can you share your environment where you get to compile 1.6.1
>>>>  successfully, so we can try and reproduce this?
>>> 
>>> I could prepare you a yocto setup to build a toolchain and packages for
>>> an qemu/docker imgage, if you are familar with that build system...
>>> 
>>>> 4. Could it be that your SMB implementation doesn’t allow for opening
>>>> and closing files in this quick succession (with our without a rename
>>>> in the mix)?
>>> 
>>> For testing it desn't need to run on SMB share, the timeout issue
>>> occures with the given fd-swap patch on a default (Linux) setup.
>>> 
>>> And a strace log does not show any underlying FS issues.
>>> 
>>> 
>>> Best,
>>> Stefan
>>> 
>>> Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
>>>> first off, CouchDB 1.6.1 is no longer supported by this project AND it
>>>> has a long list of CVEs[1] against it. You REALLY should be operating
>>>> on a newer version.
>>>> 
>>>> Secondly, just to understand your motivation: you think closing and
>>>> opening the fds after the file:rename/2 call will make things work
>>>> for your SMB operation?
>>>> 
>>>> If yes, the only think I could spot that is substantially different, is
>>>> that the NewFd position is advanced implicitly by the underlying
>>>> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
>>>> but I don’t know why that should block the gen server call, as that only
>>>> does some refcounting updates[3]. While this includes stopping the
>>>> gen_server[4], I don’t see how the Pid this operates on should be any
>>>> different under your patch.
>>>> 
>>>> So:
>>>> 
>>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>>> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
>>>> 3. Can you share your environment where you get to compile 1.6.1
>>>>  successfully, so we can try and reproduce this?
>>>> 4. Could it be that your SMB implementation doesn’t allow for opening and
>>>>  closing files in this quick succession (with our without a rename in
>>>>  the mix)?
>>>> 
>>>> 
>>>> [1]: https://docs.couchdb.org/en/stable/cve/index.html
>>>> [2]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
>>>> [3]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
>>>> [4]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
>>>> 
>>>> 
>>>> Best
>>>> Jan
>>>> — 
>>>> Professional Support for Apache CouchDB:
>>>> https://neighbourhood.ie/couchdb-support/
>>>> 
>>>> 24/7 Observation for your CouchDB Instances:
>>>> https://opservatory.app
>>>> 
>>>> 
>>>>> On 28. Feb 2023, at 10:19, Stefan Kral <st...@emlix.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>>>>> is not supported, but I ran into a (maybe simple) problem I don't
>>>>> understand. Maybe someone of you can give a hint easily (that would be
>>>>> amazing).
>>>>> 
>>>>> Given the following patch (I need to close/reopen the file descriptors
>>>>> after renaming) for the function
>>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>>>> 
>>>>>> 1 --- a/src/couchdb/couch_db_updater.erl
>>>>>> 2 +++ b/src/couchdb/couch_db_updater.erl
>>>>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, #db{filepath=Path}=Db) ->
>>>>>> 4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>>>> 5          couch_file:delete(RootDir, Filepath),
>>>>>> 6          ok = file:rename(CompactFilepath, Filepath),
>>>>>> 7 +
>>>>>> 8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>>>> 9 +        ok = couch_file:close(NewDb#db.fd),
>>>>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>>>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>>>>> 12 +        SwappedDb = NewDb2#db{
>>>>>> 13 +            fd = SwappedReaderFd,
>>>>>> 14 +            updater_fd = SwappedFd
>>>>>> 15 +        },
>>>>>> 16 +        unlink(SwappedFd),
>>>>>> 17          close_db(Db),
>>>>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>>>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>>>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>>>>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>>>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
>>>>> 
>>>>> then the gen_server:call() of line 20 never returns.
>>>>> 
>>>>> Is there a major issue with this approach or just a minor mistake in my
>>>>> implementation?
>>>>> 
>>>>> 
>>>>> Thank you for having a look,
>>>>> Stefan
>>>> 
>>>> 
>> 
> 
> -- 
> Besuchen Sie uns auf der Embedded World 2023
> 14. bis 16. März 2023 | Messe Nürnberg
> Sie finden uns in Halle 4, Stand 336
> 
> Dipl.-Ing. Stefan Kral, emlix GmbH, http://www.emlix.com
> Fon +49 30 275911-00, Fax -33
> Panoramastraße 1, 10178 Berlin, Germany
> Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
> Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
> Ust.-IdNr.: DE 205 198 055
> 
> emlix - smart embedded open source


Re: Question about file descriptor swapping

Posted by Stefan Kral <st...@emlix.com>.
Hi Jan,

here you go: https://github.com/emlix/couchdb-yocto

the mentioned patch is here
https://github.com/emlix/couchdb-yocto/blob/main/meta-couchdb/recipes-core/couchdb/files/0001-swap-fds.patch

when you run the comaction test (see README do get there)
/usr/lib/test-couchdb/test-compaction.sh

you will find in the (/var/log/couchdb/couch.log) log as last line:
[debug] [<0.173.0>] before gen_server:call

Thanks,
Stefan

Am 02.03.23 um 13:45 schrieb Jan Lehnardt:
> Hi Stefan,
> 
> Thanks for the additional info. I’m happy to try a yocto build here.
> 
> Best
> Jan
> —
> 
>> On 2. Mar 2023, at 12:24, Stefan Kral <st...@emlix.com> wrote:
>>
>> Hi,
>>
>> I can give you some background context: our CouchDB instance is running
>> on a embedded device (with minimal attack vector, so we have no pressure
>> to mitigate CVEs). CouchDB has been chosen because of its write append
>> and power fail safe property (and because of the easy scriptable
>> curl/json interface).
>>
>> Currently there is a production system running on a SMB1 share (mounted
>> in a Linux host) which works well (at least for our uses cases). SMB1 is
>> not logner the default on the Windows remote side. And SMB2/3 has an
>> issue with opening a renamend but not closed filedescriptor. The
>> question is, wether we can solve this issue with minimal changes.
>>
>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>> 2. Do you get any pertinent lines (especially crashes) in your
>>>   couch.log?
>>
>> by adding:
>>
>>> +        ?LOG_DEBUG("before gen_server:call", []),
>>>         ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>>> +        ?LOG_DEBUG("after gen_server:call", []),
>>
>> the log gives:
>>
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process spawned for db "asdf"
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for <0.391.0>: [{changes_done,1},
>>>                                                   {database,<<"asdf">>},
>>>                                                   {progress,100},
>>>                                                   {started_on,1677753384},
>>>                                                   {total_changes,1},
>>>                                                   {type,database_compaction},
>>>                                                   {updated_on,1677753384}]
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files .../asdf.couch and .../asdf.couch.compact.
>>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call
>>
>> then long time nothing...
>>
>> refreshing the db in the futon web gui gives: no response
>>
>> and the log continues with:
>>
>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server couch_compaction_daemon terminating
>>> ** Last message in was {'EXIT',<0.145.0>,
>>>                           {timeout,
>>>                               {gen_server,call,[couch_server,get_server]}}}
>>> ** When Server state == {state,<0.145.0>}
>>> ** Reason for termination ==
>>> ** {compaction_loop_died,
>>>       {timeout,{gen_server,call,[couch_server,get_server]}}}
>>>
>>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>>>                     {<0.144.0>,crash_report,
>>>                      [[{initial_call,
>>>                         {couch_compaction_daemon,init,['Argument__1']}},
>>>                        {pid,<0.144.0>},
>>>                        {registered_name,couch_compaction_daemon},
>>>                        {error_info,
>>>                         {exit,
>>>                          {compaction_loop_died,
>>>                           {timeout,
>>>                            {gen_server,call,[couch_server,get_server]}}},
>>>                          [{gen_server,terminate,7,
>>>                            [{file,"gen_server.erl"},{line,804}]},
>>>                           {proc_lib,init_p_do_apply,3,
>>>                            [{file,"proc_lib.erl"},{line,237}]}]}},
>> ...
>>
>>
>>> 3. Can you share your environment where you get to compile 1.6.1
>>>   successfully, so we can try and reproduce this?
>>
>> I could prepare you a yocto setup to build a toolchain and packages for
>> an qemu/docker imgage, if you are familar with that build system...
>>
>>> 4. Could it be that your SMB implementation doesn’t allow for opening
>>> and closing files in this quick succession (with our without a rename
>>> in the mix)?
>>
>> For testing it desn't need to run on SMB share, the timeout issue
>> occures with the given fd-swap patch on a default (Linux) setup.
>>
>> And a strace log does not show any underlying FS issues.
>>
>>
>> Best,
>> Stefan
>>
>> Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
>>> first off, CouchDB 1.6.1 is no longer supported by this project AND it
>>> has a long list of CVEs[1] against it. You REALLY should be operating
>>> on a newer version.
>>>
>>> Secondly, just to understand your motivation: you think closing and
>>> opening the fds after the file:rename/2 call will make things work
>>> for your SMB operation?
>>>
>>> If yes, the only think I could spot that is substantially different, is
>>> that the NewFd position is advanced implicitly by the underlying
>>> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
>>> but I don’t know why that should block the gen server call, as that only
>>> does some refcounting updates[3]. While this includes stopping the
>>> gen_server[4], I don’t see how the Pid this operates on should be any
>>> different under your patch.
>>>
>>> So:
>>>
>>> 1. How did you verify that the gen_server:call/3 call never returns?
>>> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
>>> 3. Can you share your environment where you get to compile 1.6.1
>>>   successfully, so we can try and reproduce this?
>>> 4. Could it be that your SMB implementation doesn’t allow for opening and
>>>   closing files in this quick succession (with our without a rename in
>>>   the mix)?
>>>
>>>
>>> [1]: https://docs.couchdb.org/en/stable/cve/index.html
>>> [2]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
>>> [3]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
>>> [4]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
>>>
>>>
>>> Best
>>> Jan
>>> — 
>>> Professional Support for Apache CouchDB:
>>> https://neighbourhood.ie/couchdb-support/
>>>
>>> 24/7 Observation for your CouchDB Instances:
>>> https://opservatory.app
>>>
>>>
>>>> On 28. Feb 2023, at 10:19, Stefan Kral <st...@emlix.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>>>> is not supported, but I ran into a (maybe simple) problem I don't
>>>> understand. Maybe someone of you can give a hint easily (that would be
>>>> amazing).
>>>>
>>>> Given the following patch (I need to close/reopen the file descriptors
>>>> after renaming) for the function
>>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>>>
>>>>> 1 --- a/src/couchdb/couch_db_updater.erl
>>>>> 2 +++ b/src/couchdb/couch_db_updater.erl
>>>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, #db{filepath=Path}=Db) ->
>>>>> 4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>>> 5          couch_file:delete(RootDir, Filepath),
>>>>> 6          ok = file:rename(CompactFilepath, Filepath),
>>>>> 7 +
>>>>> 8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>>> 9 +        ok = couch_file:close(NewDb#db.fd),
>>>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>>>> 12 +        SwappedDb = NewDb2#db{
>>>>> 13 +            fd = SwappedReaderFd,
>>>>> 14 +            updater_fd = SwappedFd
>>>>> 15 +        },
>>>>> 16 +        unlink(SwappedFd),
>>>>> 17          close_db(Db),
>>>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>>>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
>>>>
>>>> then the gen_server:call() of line 20 never returns.
>>>>
>>>> Is there a major issue with this approach or just a minor mistake in my
>>>> implementation?
>>>>
>>>>
>>>> Thank you for having a look,
>>>> Stefan
>>>
>>>
> 

-- 
Besuchen Sie uns auf der Embedded World 2023
14. bis 16. März 2023 | Messe Nürnberg
Sie finden uns in Halle 4, Stand 336

Dipl.-Ing. Stefan Kral, emlix GmbH, http://www.emlix.com
Fon +49 30 275911-00, Fax -33
Panoramastraße 1, 10178 Berlin, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke
Ust.-IdNr.: DE 205 198 055

emlix - smart embedded open source

Re: Question about file descriptor swapping

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Stefan,

Thanks for the additional info. I’m happy to try a yocto build here.

Best
Jan
—

> On 2. Mar 2023, at 12:24, Stefan Kral <st...@emlix.com> wrote:
> 
> Hi,
> 
> I can give you some background context: our CouchDB instance is running
> on a embedded device (with minimal attack vector, so we have no pressure
> to mitigate CVEs). CouchDB has been chosen because of its write append
> and power fail safe property (and because of the easy scriptable
> curl/json interface).
> 
> Currently there is a production system running on a SMB1 share (mounted
> in a Linux host) which works well (at least for our uses cases). SMB1 is
> not logner the default on the Windows remote side. And SMB2/3 has an
> issue with opening a renamend but not closed filedescriptor. The
> question is, wether we can solve this issue with minimal changes.
> 
>> 1. How did you verify that the gen_server:call/3 call never returns?
>> 2. Do you get any pertinent lines (especially crashes) in your
>>   couch.log?
> 
> by adding:
> 
>> +        ?LOG_DEBUG("before gen_server:call", []),
>>         ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>> +        ?LOG_DEBUG("after gen_server:call", []),
> 
> the log gives:
> 
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process spawned for db "asdf"
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for <0.391.0>: [{changes_done,1},
>>                                                   {database,<<"asdf">>},
>>                                                   {progress,100},
>>                                                   {started_on,1677753384},
>>                                                   {total_changes,1},
>>                                                   {type,database_compaction},
>>                                                   {updated_on,1677753384}]
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files .../asdf.couch and .../asdf.couch.compact.
>> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call
> 
> then long time nothing...
> 
> refreshing the db in the futon web gui gives: no response
> 
> and the log continues with:
> 
>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server couch_compaction_daemon terminating
>> ** Last message in was {'EXIT',<0.145.0>,
>>                           {timeout,
>>                               {gen_server,call,[couch_server,get_server]}}}
>> ** When Server state == {state,<0.145.0>}
>> ** Reason for termination ==
>> ** {compaction_loop_died,
>>       {timeout,{gen_server,call,[couch_server,get_server]}}}
>> 
>> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>>                     {<0.144.0>,crash_report,
>>                      [[{initial_call,
>>                         {couch_compaction_daemon,init,['Argument__1']}},
>>                        {pid,<0.144.0>},
>>                        {registered_name,couch_compaction_daemon},
>>                        {error_info,
>>                         {exit,
>>                          {compaction_loop_died,
>>                           {timeout,
>>                            {gen_server,call,[couch_server,get_server]}}},
>>                          [{gen_server,terminate,7,
>>                            [{file,"gen_server.erl"},{line,804}]},
>>                           {proc_lib,init_p_do_apply,3,
>>                            [{file,"proc_lib.erl"},{line,237}]}]}},
> ...
> 
> 
>> 3. Can you share your environment where you get to compile 1.6.1
>>   successfully, so we can try and reproduce this?
> 
> I could prepare you a yocto setup to build a toolchain and packages for
> an qemu/docker imgage, if you are familar with that build system...
> 
>> 4. Could it be that your SMB implementation doesn’t allow for opening
>> and closing files in this quick succession (with our without a rename
>> in the mix)?
> 
> For testing it desn't need to run on SMB share, the timeout issue
> occures with the given fd-swap patch on a default (Linux) setup.
> 
> And a strace log does not show any underlying FS issues.
> 
> 
> Best,
> Stefan
> 
> Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
>> first off, CouchDB 1.6.1 is no longer supported by this project AND it
>> has a long list of CVEs[1] against it. You REALLY should be operating
>> on a newer version.
>> 
>> Secondly, just to understand your motivation: you think closing and
>> opening the fds after the file:rename/2 call will make things work
>> for your SMB operation?
>> 
>> If yes, the only think I could spot that is substantially different, is
>> that the NewFd position is advanced implicitly by the underlying
>> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
>> but I don’t know why that should block the gen server call, as that only
>> does some refcounting updates[3]. While this includes stopping the
>> gen_server[4], I don’t see how the Pid this operates on should be any
>> different under your patch.
>> 
>> So:
>> 
>> 1. How did you verify that the gen_server:call/3 call never returns?
>> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
>> 3. Can you share your environment where you get to compile 1.6.1
>>   successfully, so we can try and reproduce this?
>> 4. Could it be that your SMB implementation doesn’t allow for opening and
>>   closing files in this quick succession (with our without a rename in
>>   the mix)?
>> 
>> 
>> [1]: https://docs.couchdb.org/en/stable/cve/index.html
>> [2]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
>> [3]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
>> [4]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
>> 
>> 
>> Best
>> Jan
>> — 
>> Professional Support for Apache CouchDB:
>> https://neighbourhood.ie/couchdb-support/
>> 
>> 24/7 Observation for your CouchDB Instances:
>> https://opservatory.app
>> 
>> 
>>> On 28. Feb 2023, at 10:19, Stefan Kral <st...@emlix.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>>> is not supported, but I ran into a (maybe simple) problem I don't
>>> understand. Maybe someone of you can give a hint easily (that would be
>>> amazing).
>>> 
>>> Given the following patch (I need to close/reopen the file descriptors
>>> after renaming) for the function
>>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>> 
>>>> 1 --- a/src/couchdb/couch_db_updater.erl
>>>> 2 +++ b/src/couchdb/couch_db_updater.erl
>>>> 3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, #db{filepath=Path}=Db) ->
>>>> 4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>> 5          couch_file:delete(RootDir, Filepath),
>>>> 6          ok = file:rename(CompactFilepath, Filepath),
>>>> 7 +
>>>> 8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>> 9 +        ok = couch_file:close(NewDb#db.fd),
>>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>>> 12 +        SwappedDb = NewDb2#db{
>>>> 13 +            fd = SwappedReaderFd,
>>>> 14 +            updater_fd = SwappedFd
>>>> 15 +        },
>>>> 16 +        unlink(SwappedFd),
>>>> 17          close_db(Db),
>>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
>>> 
>>> then the gen_server:call() of line 20 never returns.
>>> 
>>> Is there a major issue with this approach or just a minor mistake in my
>>> implementation?
>>> 
>>> 
>>> Thank you for having a look,
>>> Stefan
>> 
>> 


Re: Question about file descriptor swapping

Posted by Stefan Kral <st...@emlix.com>.
Hi,

I can give you some background context: our CouchDB instance is running
on a embedded device (with minimal attack vector, so we have no pressure
to mitigate CVEs). CouchDB has been chosen because of its write append
and power fail safe property (and because of the easy scriptable
curl/json interface).

Currently there is a production system running on a SMB1 share (mounted
in a Linux host) which works well (at least for our uses cases). SMB1 is
not logner the default on the Windows remote side. And SMB2/3 has an
issue with opening a renamend but not closed filedescriptor. The
question is, wether we can solve this issue with minimal changes.

> 1. How did you verify that the gen_server:call/3 call never returns?
> 2. Do you get any pertinent lines (especially crashes) in your
>    couch.log?

by adding:

> +        ?LOG_DEBUG("before gen_server:call", []),
>          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
> +        ?LOG_DEBUG("after gen_server:call", []),

the log gives:

> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.391.0>] Compaction process spawned for db "asdf"
> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.84.0>] New task status for <0.391.0>: [{changes_done,1},
>                                                    {database,<<"asdf">>},
>                                                    {progress,100},
>                                                    {started_on,1677753384},
>                                                    {total_changes,1},
>                                                    {type,database_compaction},
>                                                    {updated_on,1677753384}]
> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] CouchDB swapping files .../asdf.couch and .../asdf.couch.compact.
> [Thu, 02 Mar 2023 10:36:24 GMT] [debug] [<0.366.0>] before gen_server:call

then long time nothing...

refreshing the db in the futon web gui gives: no response

and the log continues with:

> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] ** Generic server couch_compaction_daemon terminating
> ** Last message in was {'EXIT',<0.145.0>,
>                            {timeout,
>                                {gen_server,call,[couch_server,get_server]}}}
> ** When Server state == {state,<0.145.0>}
> ** Reason for termination ==
> ** {compaction_loop_died,
>        {timeout,{gen_server,call,[couch_server,get_server]}}}
> 
> [Thu, 02 Mar 2023 11:02:54 GMT] [error] [<0.144.0>] {error_report,<0.31.0>,
>                      {<0.144.0>,crash_report,
>                       [[{initial_call,
>                          {couch_compaction_daemon,init,['Argument__1']}},
>                         {pid,<0.144.0>},
>                         {registered_name,couch_compaction_daemon},
>                         {error_info,
>                          {exit,
>                           {compaction_loop_died,
>                            {timeout,
>                             {gen_server,call,[couch_server,get_server]}}},
>                           [{gen_server,terminate,7,
>                             [{file,"gen_server.erl"},{line,804}]},
>                            {proc_lib,init_p_do_apply,3,
>                             [{file,"proc_lib.erl"},{line,237}]}]}},
...


> 3. Can you share your environment where you get to compile 1.6.1
>    successfully, so we can try and reproduce this?

I could prepare you a yocto setup to build a toolchain and packages for
an qemu/docker imgage, if you are familar with that build system...

> 4. Could it be that your SMB implementation doesn’t allow for opening
> and closing files in this quick succession (with our without a rename
> in the mix)?

For testing it desn't need to run on SMB share, the timeout issue
occures with the given fd-swap patch on a default (Linux) setup.

And a strace log does not show any underlying FS issues.


Best,
Stefan

Am 28.02.23 um 16:47 schrieb Jan Lehnardt:
> first off, CouchDB 1.6.1 is no longer supported by this project AND it
> has a long list of CVEs[1] against it. You REALLY should be operating
> on a newer version.
> 
> Secondly, just to understand your motivation: you think closing and
> opening the fds after the file:rename/2 call will make things work
> for your SMB operation?
> 
> If yes, the only think I could spot that is substantially different, is
> that the NewFd position is advanced implicitly by the underlying
> file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
> but I don’t know why that should block the gen server call, as that only
> does some refcounting updates[3]. While this includes stopping the
> gen_server[4], I don’t see how the Pid this operates on should be any
> different under your patch.
> 
> So:
> 
> 1. How did you verify that the gen_server:call/3 call never returns?
> 2. Do you get any pertinent lines (especially crashes) in your couch.log?
> 3. Can you share your environment where you get to compile 1.6.1
>    successfully, so we can try and reproduce this?
> 4. Could it be that your SMB implementation doesn’t allow for opening and
>    closing files in this quick succession (with our without a rename in
>    the mix)?
> 
> 
> [1]: https://docs.couchdb.org/en/stable/cve/index.html
> [2]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
> [3]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
> [4]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84
> 
> 
> Best
> Jan
> — 
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
> 
> 24/7 Observation for your CouchDB Instances:
> https://opservatory.app
> 
> 
>> On 28. Feb 2023, at 10:19, Stefan Kral <st...@emlix.com> wrote:
>>
>> Hi,
>>
>> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
>> is not supported, but I ran into a (maybe simple) problem I don't
>> understand. Maybe someone of you can give a hint easily (that would be
>> amazing).
>>
>> Given the following patch (I need to close/reopen the file descriptors
>> after renaming) for the function
>> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
>>
>>>  1 --- a/src/couchdb/couch_db_updater.erl
>>>  2 +++ b/src/couchdb/couch_db_updater.erl
>>>  3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, #db{filepath=Path}=Db) ->
>>>  4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>>  5          couch_file:delete(RootDir, Filepath),
>>>  6          ok = file:rename(CompactFilepath, Filepath),
>>>  7 +
>>>  8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>>  9 +        ok = couch_file:close(NewDb#db.fd),
>>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>>> 12 +        SwappedDb = NewDb2#db{
>>> 13 +            fd = SwappedReaderFd,
>>> 14 +            updater_fd = SwappedFd
>>> 15 +        },
>>> 16 +        unlink(SwappedFd),
>>> 17          close_db(Db),
>>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
>>
>> then the gen_server:call() of line 20 never returns.
>>
>> Is there a major issue with this approach or just a minor mistake in my
>> implementation?
>>
>>
>> Thank you for having a look,
>> Stefan
> 
> 

Re: Question about file descriptor swapping

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Stefan,

first off, CouchDB 1.6.1 is no longer supported by this project AND it
has a long list of CVEs[1] against it. You REALLY should be operating
on a newer version.

Secondly, just to understand your motivation: you think closing and
opening the fds after the file:rename/2 call will make things work
for your SMB operation?

If yes, the only think I could spot that is substantially different, is
that the NewFd position is advanced implicitly by the underlying
file:pread/3 in [2] and your SwappedFd doesn’t get the same treatment,
but I don’t know why that should block the gen server call, as that only
does some refcounting updates[3]. While this includes stopping the
gen_server[4], I don’t see how the Pid this operates on should be any
different under your patch.

So:

1. How did you verify that the gen_server:call/3 call never returns?
2. Do you get any pertinent lines (especially crashes) in your couch.log?
3. Can you share your environment where you get to compile 1.6.1
   successfully, so we can try and reproduce this?
4. Could it be that your SMB implementation doesn’t allow for opening and
   closing files in this quick succession (with our without a rename in
   the mix)?


[1]: https://docs.couchdb.org/en/stable/cve/index.html
[2]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L179
[3]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db.erl#L1122-L1130
[4]: https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_ref_counter.erl#L84


Best
Jan
— 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

24/7 Observation for your CouchDB Instances:
https://opservatory.app


> On 28. Feb 2023, at 10:19, Stefan Kral <st...@emlix.com> wrote:
> 
> Hi,
> 
> I'm experimenting with a CouchDB setup on a SMB mount point. I know this
> is not supported, but I ran into a (maybe simple) problem I don't
> understand. Maybe someone of you can give a hint easily (that would be
> amazing).
> 
> Given the following patch (I need to close/reopen the file descriptors
> after renaming) for the function
> https://github.com/apache/couchdb/blob/1.6.x/src/couchdb/couch_db_updater.erl#L176
> 
>>  1 --- a/src/couchdb/couch_db_updater.erl
>>  2 +++ b/src/couchdb/couch_db_updater.erl
>>  3 @@ -202,8 +202,18 @@ handle_call({compact_done, CompactFilepath}, _From, #db{filepath=Path}=Db) ->
>>  4          RootDir = couch_config:get("couchdb", "database_dir", "."),
>>  5          couch_file:delete(RootDir, Filepath),
>>  6          ok = file:rename(CompactFilepath, Filepath),
>>  7 +
>>  8 +        ok = couch_file:close(NewDb#db.updater_fd),
>>  9 +        ok = couch_file:close(NewDb#db.fd),
>> 10 +        {ok, SwappedFd} = couch_file:open(Filepath),
>> 11 +        SwappedReaderFd = open_reader_fd(Filepath, Db#db.options),
>> 12 +        SwappedDb = NewDb2#db{
>> 13 +            fd = SwappedReaderFd,
>> 14 +            updater_fd = SwappedFd
>> 15 +        },
>> 16 +        unlink(SwappedFd),
>> 17          close_db(Db),
>> 18 -        NewDb3 = refresh_validate_doc_funs(NewDb2),
>> 19 +        NewDb3 = refresh_validate_doc_funs(SwappedDb),
>> 20          ok = gen_server:call(Db#db.main_pid, {db_updated, NewDb3}, infinity),
>> 21          couch_db_update_notifier:notify({compacted, NewDb3#db.name}),
>> 22          ?LOG_INFO("Compaction for db \"~s\" completed.", [Db#db.name]),
> 
> then the gen_server:call() of line 20 never returns.
> 
> Is there a major issue with this approach or just a minor mistake in my
> implementation?
> 
> 
> Thank you for having a look,
> Stefan