You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Kiril Stankov <ki...@open-net.biz> on 2016/11/27 08:46:00 UTC

Couch stopped working (service was still alive)

Hi,

I adore Couch, but one thing always puzzled me is the log files syntax.
I understand it is readable for Erlang people, but not being one of 
them, can, please, someone help me decipher the lines below.

Couch stopped answering queries yesterday, after 6 months of flawless 
operation.
There was nothing unusual on the OS, disk has plenty of space, etc. 
Ubuntu 14. Couch 1.6.

Restarting it fixed the problem.

But I am still unsure if possible data corruption may have occurred and 
what was the root cause of the problem.

Thanks!

[Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.2510.0>] {error_report,<0.78.0>,
                       {<0.2510.0>,crash_report,
[[{initial_call,{disksup,init,['Argument__1']}},
                          {pid,<0.2510.0>},
                          {registered_name,disksup},
                          {error_info,
                           {exit,
                            {badarg,
[{erlang,port_close,[#Port<0.3152>],[]},
                              {disksup,terminate,2,
                               [{file,"disksup.erl"},{line,164}]},
                              {gen_server,terminate,6,
                               [{file,"gen_server.erl"},{line,719}]},
                              {proc_lib,init_p_do_apply,3,
                               [{file,"proc_lib.erl"},{line,239}]}]},
                            [{gen_server,terminate,6,
                              [{file,"gen_server.erl"},{line,722}]},
                             {proc_lib,init_p_do_apply,3,
                              [{file,"proc_lib.erl"},{line,239}]}]}},
                          {ancestors,[os_mon_sup,<0.79.0>]},
                          {messages,[]},
                          {links,[<0.80.0>]},
                          {dictionary,[]},
                          {trap_exit,true},
                          {status,running},
                          {heap_size,610},
                          {stack_size,27},
                          {reductions,350}],
                         []]}}

[Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.80.0>] {error_report,<0.78.0>,
                        {<0.80.0>,supervisor_report,
                         [{supervisor,{local,os_mon_sup}},
                          {errorContext,child_terminated},
                          {reason,
                              {badarg,
[{erlang,port_close,[#Port<0.3157>],[]},
                                   {disksup,terminate,2,
[{file,"disksup.erl"},{line,164}]},
                                   {gen_server,terminate,6,
[{file,"gen_server.erl"},{line,719}]},
                                   {proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]}},
                          {offender,
                              [{pid,<0.2514.0>},
                               {name,disksup},
                               {mfargs,{disksup,start_link,[]}},
                               {restart_type,permanent},
                               {shutdown,2000},
                               {child_type,worker}]}]}}


[Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.2514.0>] ** Generic server 
disksup terminating
** Last message in was timeout
** When Server state == {state,100,60000,{unix,linux},[],#Port<0.3157>}
** Reason for termination ==
** {badarg,[{erlang,port_close,[#Port<0.3157>],[]},
             {disksup,terminate,2,[{file,"disksup.erl"},{line,164}]},
{gen_server,terminate,6,[{file,"gen_server.erl"},{line,719}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}

-- 
------------------------------------------------------------------------
*With best regards,*
Kiril Stankov,
CEO


            This Email disclaimer
            <http://open-net.biz/emailsignature.html> is integral part
            of this message.


Re: Couch stopped working (service was still alive)

Posted by Kiril Stankov <ki...@open-net.biz>.
Hi,
thanks for the answer.
Yes, that's the Erlang version.
No, there is nothing else in the log, this error appears several times 
(actually on every access to Couch - it was actually alive, but no 
answering).
Nothing special in the calls to Couch before that. Neither on the OS 
side - regular, not very busy use.
We had much higher usage peeks and nothing like this happened for months.

So, another question - can an automatic restart of Couch be safely 
assumed as solution in this case? I can have a script trying to get a 
test doc, and if it fails, then /service couchdb restart?

/Thanks./
/
------------------------------------------------------------------------
*With best regards,*
Kiril Stankov

On 11/28/2016 1:29 AM, Robert Samuel Newson wrote:
> looks like a bug in a core Erlang/OTP application (disksup) rather than couchdb itself. What version of erlang are you using?
>
> Ubuntu 14.04 bundles R16B03 so I'll assume that for now, so that's
>
> terminate(_Reason, State) ->
>      clear_alarms(),
>      case State#state.port of
>          not_used ->
>              ok;
>          Port ->
>              port_close(Port)
>      end,
>      ok.
>
> Which is to say, the reason for that process to terminate is a bug in the terminate callback function, and that makes no sense to me. I suspect, but don't know, that this is just wonky reporting. Something else must have caused termination, and the bug in terminate/2 is being reported erroneously as that reason. Perhaps there's more in your log that would explain this.
>
> On your question of data corruption, that won't happen for any kind of process crash in couchdb. The worst scenario would be any un-fsynced data being lost and all databases rolling back to the last fsynced state. For this reason, we recommend setting the delayed_commits config setting to false. This defaults to true in versions earlier than 2.0, and defaults to true from 2.0 onward.
>
> HTH
> B.
>
>
>> On 27 Nov 2016, at 08:46, Kiril Stankov <ki...@open-net.biz> wrote:
>>
>> Hi,
>>
>> I adore Couch, but one thing always puzzled me is the log files syntax.
>> I understand it is readable for Erlang people, but not being one of them, can, please, someone help me decipher the lines below.
>>
>> Couch stopped answering queries yesterday, after 6 months of flawless operation.
>> There was nothing unusual on the OS, disk has plenty of space, etc. Ubuntu 14. Couch 1.6.
>>
>> Restarting it fixed the problem.
>>
>> But I am still unsure if possible data corruption may have occurred and what was the root cause of the problem.
>>
>> Thanks!
>>
>> [Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.2510.0>] {error_report,<0.78.0>,
>>                       {<0.2510.0>,crash_report,
>> [[{initial_call,{disksup,init,['Argument__1']}},
>>                          {pid,<0.2510.0>},
>>                          {registered_name,disksup},
>>                          {error_info,
>>                           {exit,
>>                            {badarg,
>> [{erlang,port_close,[#Port<0.3152>],[]},
>>                              {disksup,terminate,2,
>>                               [{file,"disksup.erl"},{line,164}]},
>>                              {gen_server,terminate,6,
>>                               [{file,"gen_server.erl"},{line,719}]},
>>                              {proc_lib,init_p_do_apply,3,
>>                               [{file,"proc_lib.erl"},{line,239}]}]},
>>                            [{gen_server,terminate,6,
>>                              [{file,"gen_server.erl"},{line,722}]},
>>                             {proc_lib,init_p_do_apply,3,
>>                              [{file,"proc_lib.erl"},{line,239}]}]}},
>>                          {ancestors,[os_mon_sup,<0.79.0>]},
>>                          {messages,[]},
>>                          {links,[<0.80.0>]},
>>                          {dictionary,[]},
>>                          {trap_exit,true},
>>                          {status,running},
>>                          {heap_size,610},
>>                          {stack_size,27},
>>                          {reductions,350}],
>>                         []]}}
>>
>> [Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.80.0>] {error_report,<0.78.0>,
>>                        {<0.80.0>,supervisor_report,
>>                         [{supervisor,{local,os_mon_sup}},
>>                          {errorContext,child_terminated},
>>                          {reason,
>>                              {badarg,
>> [{erlang,port_close,[#Port<0.3157>],[]},
>>                                   {disksup,terminate,2,
>> [{file,"disksup.erl"},{line,164}]},
>>                                   {gen_server,terminate,6,
>> [{file,"gen_server.erl"},{line,719}]},
>>                                   {proc_lib,init_p_do_apply,3,
>> [{file,"proc_lib.erl"},{line,239}]}]}},
>>                          {offender,
>>                              [{pid,<0.2514.0>},
>>                               {name,disksup},
>>                               {mfargs,{disksup,start_link,[]}},
>>                               {restart_type,permanent},
>>                               {shutdown,2000},
>>                               {child_type,worker}]}]}}
>>
>>
>> [Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.2514.0>] ** Generic server disksup terminating
>> ** Last message in was timeout
>> ** When Server state == {state,100,60000,{unix,linux},[],#Port<0.3157>}
>> ** Reason for termination ==
>> ** {badarg,[{erlang,port_close,[#Port<0.3157>],[]},
>>             {disksup,terminate,2,[{file,"disksup.erl"},{line,164}]},
>> {gen_server,terminate,6,[{file,"gen_server.erl"},{line,719}]},
>> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>>
>> -- 
>> ------------------------------------------------------------------------
>> *With best regards,*
>> Kiril Stankov,
>> CEO
>>
>>
>>            This Email disclaimer
>>            <http://open-net.biz/emailsignature.html> is integral part
>>            of this message.
>>


Re: Couch stopped working (service was still alive)

Posted by Robert Samuel Newson <rn...@apache.org>.
looks like a bug in a core Erlang/OTP application (disksup) rather than couchdb itself. What version of erlang are you using?

Ubuntu 14.04 bundles R16B03 so I'll assume that for now, so that's

terminate(_Reason, State) ->
    clear_alarms(),
    case State#state.port of
        not_used ->
            ok;
        Port ->
            port_close(Port)
    end,
    ok.

Which is to say, the reason for that process to terminate is a bug in the terminate callback function, and that makes no sense to me. I suspect, but don't know, that this is just wonky reporting. Something else must have caused termination, and the bug in terminate/2 is being reported erroneously as that reason. Perhaps there's more in your log that would explain this.

On your question of data corruption, that won't happen for any kind of process crash in couchdb. The worst scenario would be any un-fsynced data being lost and all databases rolling back to the last fsynced state. For this reason, we recommend setting the delayed_commits config setting to false. This defaults to true in versions earlier than 2.0, and defaults to true from 2.0 onward.

HTH
B.


> On 27 Nov 2016, at 08:46, Kiril Stankov <ki...@open-net.biz> wrote:
> 
> Hi,
> 
> I adore Couch, but one thing always puzzled me is the log files syntax.
> I understand it is readable for Erlang people, but not being one of them, can, please, someone help me decipher the lines below.
> 
> Couch stopped answering queries yesterday, after 6 months of flawless operation.
> There was nothing unusual on the OS, disk has plenty of space, etc. Ubuntu 14. Couch 1.6.
> 
> Restarting it fixed the problem.
> 
> But I am still unsure if possible data corruption may have occurred and what was the root cause of the problem.
> 
> Thanks!
> 
> [Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.2510.0>] {error_report,<0.78.0>,
>                      {<0.2510.0>,crash_report,
> [[{initial_call,{disksup,init,['Argument__1']}},
>                         {pid,<0.2510.0>},
>                         {registered_name,disksup},
>                         {error_info,
>                          {exit,
>                           {badarg,
> [{erlang,port_close,[#Port<0.3152>],[]},
>                             {disksup,terminate,2,
>                              [{file,"disksup.erl"},{line,164}]},
>                             {gen_server,terminate,6,
>                              [{file,"gen_server.erl"},{line,719}]},
>                             {proc_lib,init_p_do_apply,3,
>                              [{file,"proc_lib.erl"},{line,239}]}]},
>                           [{gen_server,terminate,6,
>                             [{file,"gen_server.erl"},{line,722}]},
>                            {proc_lib,init_p_do_apply,3,
>                             [{file,"proc_lib.erl"},{line,239}]}]}},
>                         {ancestors,[os_mon_sup,<0.79.0>]},
>                         {messages,[]},
>                         {links,[<0.80.0>]},
>                         {dictionary,[]},
>                         {trap_exit,true},
>                         {status,running},
>                         {heap_size,610},
>                         {stack_size,27},
>                         {reductions,350}],
>                        []]}}
> 
> [Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.80.0>] {error_report,<0.78.0>,
>                       {<0.80.0>,supervisor_report,
>                        [{supervisor,{local,os_mon_sup}},
>                         {errorContext,child_terminated},
>                         {reason,
>                             {badarg,
> [{erlang,port_close,[#Port<0.3157>],[]},
>                                  {disksup,terminate,2,
> [{file,"disksup.erl"},{line,164}]},
>                                  {gen_server,terminate,6,
> [{file,"gen_server.erl"},{line,719}]},
>                                  {proc_lib,init_p_do_apply,3,
> [{file,"proc_lib.erl"},{line,239}]}]}},
>                         {offender,
>                             [{pid,<0.2514.0>},
>                              {name,disksup},
>                              {mfargs,{disksup,start_link,[]}},
>                              {restart_type,permanent},
>                              {shutdown,2000},
>                              {child_type,worker}]}]}}
> 
> 
> [Fri, 25 Nov 2016 14:53:01 GMT] [error] [<0.2514.0>] ** Generic server disksup terminating
> ** Last message in was timeout
> ** When Server state == {state,100,60000,{unix,linux},[],#Port<0.3157>}
> ** Reason for termination ==
> ** {badarg,[{erlang,port_close,[#Port<0.3157>],[]},
>            {disksup,terminate,2,[{file,"disksup.erl"},{line,164}]},
> {gen_server,terminate,6,[{file,"gen_server.erl"},{line,719}]},
> {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
> 
> -- 
> ------------------------------------------------------------------------
> *With best regards,*
> Kiril Stankov,
> CEO
> 
> 
>           This Email disclaimer
>           <http://open-net.biz/emailsignature.html> is integral part
>           of this message.
>