You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Paolo Negri <pa...@wooga.net> on 2011/10/14 10:41:28 UTC

timeout hitting a database url after launching compaction

Dear list,

We have a script that does the following (strictly sequentially)

1) update 300K docs in a db
2) launch compaction of the db
3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
when compaction completed

Last night we got a timeout error during 3, we think that this might
be because the first polling (GET  http://127.0.0.1:5984/database) is
done right after triggering compaction

I thought the dev team might be interested in knowing that this is happening

There's no other activity on the couchdb instance other than what
described in this email.

ERROR unexpectd response checking compaction db: {ok,"500",
                                                [{"Server",

"CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
                                                 {"Date",
                                                  "Fri, 14 Oct 2011
01:46:37 GMT"},
                                                 {"Content-Type",
                                                  "text/plain; charset=utf-8"},
                                                 {"Content-Length","350"},
                                                 {"Cache-Control",
                                                  "must-revalidate"}],

<<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
  [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
[{user_ctx,\\n              {user_ctx,null,\\n
[<<\\\"_admin\\\">>],\\n                  <<\\\"{couch_httpd_auth,
default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}


Thanks,

Paolo

Re: timeout hitting a database url after launching compaction

Posted by CGS <cg...@gmail.com>.

I agree with you. Maybe you should report this to the developers and see 
if they can do something about or what they suggest (not that some of 
them are not following this list, but they seem busy in following their 
dedicated list as I noticed in the last days it was pretty busy with the 
voting of 1.1.1 and now it seems a little bit of calm period there).

Cheers,
CGS

On 10/26/2011 11:12 AM, Paolo Negri wrote:
> On Wed, Oct 26, 2011 at 10:48 AM, CGS<cg...@gmail.com>  wrote:
>> Just for my curiosity, is this behavior sporadic or it is continuous for the
>> full period of compaction?
> It's sporadic, we poll the db status every 60 seconds, and we observe
> this happening only occasionally my guess is that it depends by the io
> load at the exact point in time of the call (io load varies much
> during compaction).
>
>>
>>
>> On 10/26/2011 10:18 AM, Paolo Negri wrote:
>>> I just wanted to add some more information about this behavior, the
>>> problem happens not just after triggering compaction but can happen at
>>> any point while compaction is in progress, tonight we got the same
>>> error 20 minutes after launching compaction.
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>
>
>

Re: timeout hitting a database url after launching compaction

Posted by Paolo Negri <pa...@wooga.net>.

On Wed, Oct 26, 2011 at 10:48 AM, CGS <cg...@gmail.com> wrote:
> Just for my curiosity, is this behavior sporadic or it is continuous for the
> full period of compaction?

It's sporadic, we poll the db status every 60 seconds, and we observe
this happening only occasionally my guess is that it depends by the io
load at the exact point in time of the call (io load varies much
during compaction).

>
>
>
> On 10/26/2011 10:18 AM, Paolo Negri wrote:
>>
>> I just wanted to add some more information about this behavior, the
>> problem happens not just after triggering compaction but can happen at
>> any point while compaction is in progress, tonight we got the same
>> error 20 minutes after launching compaction.
>>
>> Thanks,
>>
>> Paolo
>>
>
>

-- 
Engineering
http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064

wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Re: timeout hitting a database url after launching compaction

Posted by CGS <cg...@gmail.com>.

Just for my curiosity, is this behavior sporadic or it is continuous for 
the full period of compaction?



On 10/26/2011 10:18 AM, Paolo Negri wrote:
> I just wanted to add some more information about this behavior, the
> problem happens not just after triggering compaction but can happen at
> any point while compaction is in progress, tonight we got the same
> error 20 minutes after launching compaction.
>
> Thanks,
>
> Paolo
>

Re: timeout hitting a database url after launching compaction

Posted by Paolo Negri <pa...@wooga.net>.

I just wanted to add some more information about this behavior, the
problem happens not just after triggering compaction but can happen at
any point while compaction is in progress, tonight we got the same
error 20 minutes after launching compaction.

Thanks,

Paolo

On Mon, Oct 17, 2011 at 2:44 PM, Paolo Negri <pa...@wooga.net> wrote:
> On Mon, Oct 17, 2011 at 2:30 PM, Robert Newson <rn...@apache.org> wrote:
>> Do you have the full stacktrace from couch.log?
>
> I pasted it here https://gist.github.com/1292529
>
>>
>> On 17 October 2011 13:04, Paolo Negri <pa...@wooga.net> wrote:
>>> On Mon, Oct 17, 2011 at 1:57 PM, Robert Newson <rn...@apache.org> wrote:
>>>> Compaction is an online process, there should be no expectation of 500
>>>> responses before, during, or after compaction.
>>>>
>>>> In this case, it seems the couch_server process is blocked for more
>>>> than five seconds performing I/O and the gen_server:call from
>>>> couch_server:open times out. This timeout has been increased to
>>>> infinity since 1.0.0.
>>>>
>>>> What version are you running?
>>>
>>> I compiled master from github here are the details
>>>
>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>
>>> The reason to use master is that we wanted to benefit from the
>>> ejson/snappy adoption so I guess I could actually also use the 1.2
>>> branch
>>>
>>> Paolo
>>>
>>>>
>>>> B.
>>>>
>>>> On 17 October 2011 12:05, Martin Hewitt <ma...@thenoi.se> wrote:
>>>>> I disagree, it makes sense as the 5xx error code range is for responses where the server can't fulfil a well-formed, valid client request.
>>>>>
>>>>> Your GET is well-formed, but the server can't process it as it's working on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate, but the 5xx prefix is certainly correct.
>>>>>
>>>>> Martin
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 17 Oct 2011, at 09:29, Paolo Negri <pa...@wooga.net> wrote:
>>>>>
>>>>>> I agree on the fact that what happens is pretty clear to explain, I
>>>>>> still thought it would be useful for the developers to know since
>>>>>> offering a 500 status code for a known system condition is probably
>>>>>> something that can be improved.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
>>>>>>> I am not developer, but it's quite logic, I may say. Once you started the
>>>>>>> compaction, your CouchDB is not responsive while the database is preparing
>>>>>>> for compaction. Triggering immediately GET, the web instance responds with
>>>>>>> status code 500 (internal server error, meaning unresponsive server in this
>>>>>>> case). So, nothing unusual in my opinion.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> CGS
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>>>>>>>
>>>>>>>> IO activity is not monitored, there's only one db on the couchdb
>>>>>>>> instance and the described job is the only activity executed on this
>>>>>>>> machine.
>>>>>>>> Delaying the first request on the database url by 30 seconds did
>>>>>>>> actually prevent the problem from happening again.
>>>>>>>> So the issue seems to happen specifically at the moment right after
>>>>>>>> compaction is started.
>>>>>>>> The database is about 7GB big once compressed, the server is hosted on
>>>>>>>> ec2 with the database directory placed on his own dedicated ephemeral
>>>>>>>> storage.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Paolo
>>>>>>>>
>>>>>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>>>>>>> this. I've seen some compactions wallop a system when it switches over
>>>>>>>>> due to removing large old files and such. It doesn't sound like this
>>>>>>>>> is big enough for that case but it might be something worth checking.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> Dear list,
>>>>>>>>>>
>>>>>>>>>> We have a script that does the following (strictly sequentially)
>>>>>>>>>>
>>>>>>>>>> 1) update 300K docs in a db
>>>>>>>>>> 2) launch compaction of the db
>>>>>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>>>>>>> when compaction completed
>>>>>>>>>>
>>>>>>>>>> Last night we got a timeout error during 3, we think that this might
>>>>>>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>>>>>>> done right after triggering compaction
>>>>>>>>>>
>>>>>>>>>> I thought the dev team might be interested in knowing that this is
>>>>>>>>>> happening
>>>>>>>>>>
>>>>>>>>>> There's no other activity on the couchdb instance other than what
>>>>>>>>>> described in this email.
>>>>>>>>>>
>>>>>>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>>>>>>                                                 [{"Server",
>>>>>>>>>>
>>>>>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>>>>>>                                                  {"Date",
>>>>>>>>>>                                                   "Fri, 14 Oct 2011
>>>>>>>>>> 01:46:37 GMT"},
>>>>>>>>>>                                                  {"Content-Type",
>>>>>>>>>>                                                   "text/plain;
>>>>>>>>>> charset=utf-8"},
>>>>>>>>>>
>>>>>>>>>>  {"Content-Length","350"},
>>>>>>>>>>                                                  {"Cache-Control",
>>>>>>>>>>                                                   "must-revalidate"}],
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>>>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>>>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Paolo

Re: timeout hitting a database url after launching compaction

Posted by Paolo Negri <pa...@wooga.net>.

On Mon, Oct 17, 2011 at 2:30 PM, Robert Newson <rn...@apache.org> wrote:
> Do you have the full stacktrace from couch.log?

I pasted it here https://gist.github.com/1292529

>
> On 17 October 2011 13:04, Paolo Negri <pa...@wooga.net> wrote:
>> On Mon, Oct 17, 2011 at 1:57 PM, Robert Newson <rn...@apache.org> wrote:
>>> Compaction is an online process, there should be no expectation of 500
>>> responses before, during, or after compaction.
>>>
>>> In this case, it seems the couch_server process is blocked for more
>>> than five seconds performing I/O and the gen_server:call from
>>> couch_server:open times out. This timeout has been increased to
>>> infinity since 1.0.0.
>>>
>>> What version are you running?
>>
>> I compiled master from github here are the details
>>
>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>
>> The reason to use master is that we wanted to benefit from the
>> ejson/snappy adoption so I guess I could actually also use the 1.2
>> branch
>>
>> Paolo
>>
>>>
>>> B.
>>>
>>> On 17 October 2011 12:05, Martin Hewitt <ma...@thenoi.se> wrote:
>>>> I disagree, it makes sense as the 5xx error code range is for responses where the server can't fulfil a well-formed, valid client request.
>>>>
>>>> Your GET is well-formed, but the server can't process it as it's working on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate, but the 5xx prefix is certainly correct.
>>>>
>>>> Martin
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 17 Oct 2011, at 09:29, Paolo Negri <pa...@wooga.net> wrote:
>>>>
>>>>> I agree on the fact that what happens is pretty clear to explain, I
>>>>> still thought it would be useful for the developers to know since
>>>>> offering a 500 status code for a known system condition is probably
>>>>> something that can be improved.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Paolo
>>>>>
>>>>> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
>>>>>> I am not developer, but it's quite logic, I may say. Once you started the
>>>>>> compaction, your CouchDB is not responsive while the database is preparing
>>>>>> for compaction. Triggering immediately GET, the web instance responds with
>>>>>> status code 500 (internal server error, meaning unresponsive server in this
>>>>>> case). So, nothing unusual in my opinion.
>>>>>>
>>>>>> Cheers,
>>>>>> CGS
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>>>>>>
>>>>>>> IO activity is not monitored, there's only one db on the couchdb
>>>>>>> instance and the described job is the only activity executed on this
>>>>>>> machine.
>>>>>>> Delaying the first request on the database url by 30 seconds did
>>>>>>> actually prevent the problem from happening again.
>>>>>>> So the issue seems to happen specifically at the moment right after
>>>>>>> compaction is started.
>>>>>>> The database is about 7GB big once compressed, the server is hosted on
>>>>>>> ec2 with the database directory placed on his own dedicated ephemeral
>>>>>>> storage.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Paolo
>>>>>>>
>>>>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>>>>>> this. I've seen some compactions wallop a system when it switches over
>>>>>>>> due to removing large old files and such. It doesn't sound like this
>>>>>>>> is big enough for that case but it might be something worth checking.
>>>>>>>>
>>>>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Dear list,
>>>>>>>>>
>>>>>>>>> We have a script that does the following (strictly sequentially)
>>>>>>>>>
>>>>>>>>> 1) update 300K docs in a db
>>>>>>>>> 2) launch compaction of the db
>>>>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>>>>>> when compaction completed
>>>>>>>>>
>>>>>>>>> Last night we got a timeout error during 3, we think that this might
>>>>>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>>>>>> done right after triggering compaction
>>>>>>>>>
>>>>>>>>> I thought the dev team might be interested in knowing that this is
>>>>>>>>> happening
>>>>>>>>>
>>>>>>>>> There's no other activity on the couchdb instance other than what
>>>>>>>>> described in this email.
>>>>>>>>>
>>>>>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>>>>>                                                 [{"Server",
>>>>>>>>>
>>>>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>>>>>                                                  {"Date",
>>>>>>>>>                                                   "Fri, 14 Oct 2011
>>>>>>>>> 01:46:37 GMT"},
>>>>>>>>>                                                  {"Content-Type",
>>>>>>>>>                                                   "text/plain;
>>>>>>>>> charset=utf-8"},
>>>>>>>>>
>>>>>>>>>  {"Content-Length","350"},
>>>>>>>>>                                                  {"Cache-Control",
>>>>>>>>>                                                   "must-revalidate"}],
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Paolo
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Engineering
>>>>> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
>>>>>
>>>>> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
>>>>> Sitz der Gesellschaft: Berlin; HRB 117846 B
>>>>> Registergericht Berlin-Charlottenburg
>>>>> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
>>>>
>>>
>>
>>
>>
>> --
>> Engineering
>> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
>>
>> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
>> Sitz der Gesellschaft: Berlin; HRB 117846 B
>> Registergericht Berlin-Charlottenburg
>> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
>>
>



-- 
Engineering
http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064

wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Re: timeout hitting a database url after launching compaction

Posted by Robert Newson <rn...@apache.org>.

Do you have the full stacktrace from couch.log?

On 17 October 2011 13:04, Paolo Negri <pa...@wooga.net> wrote:
> On Mon, Oct 17, 2011 at 1:57 PM, Robert Newson <rn...@apache.org> wrote:
>> Compaction is an online process, there should be no expectation of 500
>> responses before, during, or after compaction.
>>
>> In this case, it seems the couch_server process is blocked for more
>> than five seconds performing I/O and the gen_server:call from
>> couch_server:open times out. This timeout has been increased to
>> infinity since 1.0.0.
>>
>> What version are you running?
>
> I compiled master from github here are the details
>
> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>
> The reason to use master is that we wanted to benefit from the
> ejson/snappy adoption so I guess I could actually also use the 1.2
> branch
>
> Paolo
>
>>
>> B.
>>
>> On 17 October 2011 12:05, Martin Hewitt <ma...@thenoi.se> wrote:
>>> I disagree, it makes sense as the 5xx error code range is for responses where the server can't fulfil a well-formed, valid client request.
>>>
>>> Your GET is well-formed, but the server can't process it as it's working on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate, but the 5xx prefix is certainly correct.
>>>
>>> Martin
>>>
>>> Sent from my iPhone
>>>
>>> On 17 Oct 2011, at 09:29, Paolo Negri <pa...@wooga.net> wrote:
>>>
>>>> I agree on the fact that what happens is pretty clear to explain, I
>>>> still thought it would be useful for the developers to know since
>>>> offering a 500 status code for a known system condition is probably
>>>> something that can be improved.
>>>>
>>>> Thanks,
>>>>
>>>> Paolo
>>>>
>>>> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
>>>>> I am not developer, but it's quite logic, I may say. Once you started the
>>>>> compaction, your CouchDB is not responsive while the database is preparing
>>>>> for compaction. Triggering immediately GET, the web instance responds with
>>>>> status code 500 (internal server error, meaning unresponsive server in this
>>>>> case). So, nothing unusual in my opinion.
>>>>>
>>>>> Cheers,
>>>>> CGS
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>>>>>
>>>>>> IO activity is not monitored, there's only one db on the couchdb
>>>>>> instance and the described job is the only activity executed on this
>>>>>> machine.
>>>>>> Delaying the first request on the database url by 30 seconds did
>>>>>> actually prevent the problem from happening again.
>>>>>> So the issue seems to happen specifically at the moment right after
>>>>>> compaction is started.
>>>>>> The database is about 7GB big once compressed, the server is hosted on
>>>>>> ec2 with the database directory placed on his own dedicated ephemeral
>>>>>> storage.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>>>>> this. I've seen some compactions wallop a system when it switches over
>>>>>>> due to removing large old files and such. It doesn't sound like this
>>>>>>> is big enough for that case but it might be something worth checking.
>>>>>>>
>>>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> Dear list,
>>>>>>>>
>>>>>>>> We have a script that does the following (strictly sequentially)
>>>>>>>>
>>>>>>>> 1) update 300K docs in a db
>>>>>>>> 2) launch compaction of the db
>>>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>>>>> when compaction completed
>>>>>>>>
>>>>>>>> Last night we got a timeout error during 3, we think that this might
>>>>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>>>>> done right after triggering compaction
>>>>>>>>
>>>>>>>> I thought the dev team might be interested in knowing that this is
>>>>>>>> happening
>>>>>>>>
>>>>>>>> There's no other activity on the couchdb instance other than what
>>>>>>>> described in this email.
>>>>>>>>
>>>>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>>>>                                                 [{"Server",
>>>>>>>>
>>>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>>>>                                                  {"Date",
>>>>>>>>                                                   "Fri, 14 Oct 2011
>>>>>>>> 01:46:37 GMT"},
>>>>>>>>                                                  {"Content-Type",
>>>>>>>>                                                   "text/plain;
>>>>>>>> charset=utf-8"},
>>>>>>>>
>>>>>>>>  {"Content-Length","350"},
>>>>>>>>                                                  {"Cache-Control",
>>>>>>>>                                                   "must-revalidate"}],
>>>>>>>>
>>>>>>>>
>>>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Paolo
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Engineering
>>>> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
>>>>
>>>> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
>>>> Sitz der Gesellschaft: Berlin; HRB 117846 B
>>>> Registergericht Berlin-Charlottenburg
>>>> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
>>>
>>
>
>
>
> --
> Engineering
> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
>
> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
> Sitz der Gesellschaft: Berlin; HRB 117846 B
> Registergericht Berlin-Charlottenburg
> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
>

Re: timeout hitting a database url after launching compaction

Posted by Paolo Negri <pa...@wooga.net>.

On Mon, Oct 17, 2011 at 1:57 PM, Robert Newson <rn...@apache.org> wrote:
> Compaction is an online process, there should be no expectation of 500
> responses before, during, or after compaction.
>
> In this case, it seems the couch_server process is blocked for more
> than five seconds performing I/O and the gen_server:call from
> couch_server:open times out. This timeout has been increased to
> infinity since 1.0.0.
>
> What version are you running?

I compiled master from github here are the details

"CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},

The reason to use master is that we wanted to benefit from the
ejson/snappy adoption so I guess I could actually also use the 1.2
branch

Paolo

>
> B.
>
> On 17 October 2011 12:05, Martin Hewitt <ma...@thenoi.se> wrote:
>> I disagree, it makes sense as the 5xx error code range is for responses where the server can't fulfil a well-formed, valid client request.
>>
>> Your GET is well-formed, but the server can't process it as it's working on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate, but the 5xx prefix is certainly correct.
>>
>> Martin
>>
>> Sent from my iPhone
>>
>> On 17 Oct 2011, at 09:29, Paolo Negri <pa...@wooga.net> wrote:
>>
>>> I agree on the fact that what happens is pretty clear to explain, I
>>> still thought it would be useful for the developers to know since
>>> offering a 500 status code for a known system condition is probably
>>> something that can be improved.
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
>>>> I am not developer, but it's quite logic, I may say. Once you started the
>>>> compaction, your CouchDB is not responsive while the database is preparing
>>>> for compaction. Triggering immediately GET, the web instance responds with
>>>> status code 500 (internal server error, meaning unresponsive server in this
>>>> case). So, nothing unusual in my opinion.
>>>>
>>>> Cheers,
>>>> CGS
>>>>
>>>>
>>>>
>>>>
>>>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>>>>
>>>>> IO activity is not monitored, there's only one db on the couchdb
>>>>> instance and the described job is the only activity executed on this
>>>>> machine.
>>>>> Delaying the first request on the database url by 30 seconds did
>>>>> actually prevent the problem from happening again.
>>>>> So the issue seems to happen specifically at the moment right after
>>>>> compaction is started.
>>>>> The database is about 7GB big once compressed, the server is hosted on
>>>>> ec2 with the database directory placed on his own dedicated ephemeral
>>>>> storage.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Paolo
>>>>>
>>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>>>  wrote:
>>>>>>
>>>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>>>> this. I've seen some compactions wallop a system when it switches over
>>>>>> due to removing large old files and such. It doesn't sound like this
>>>>>> is big enough for that case but it might be something worth checking.
>>>>>>
>>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Dear list,
>>>>>>>
>>>>>>> We have a script that does the following (strictly sequentially)
>>>>>>>
>>>>>>> 1) update 300K docs in a db
>>>>>>> 2) launch compaction of the db
>>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>>>> when compaction completed
>>>>>>>
>>>>>>> Last night we got a timeout error during 3, we think that this might
>>>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>>>> done right after triggering compaction
>>>>>>>
>>>>>>> I thought the dev team might be interested in knowing that this is
>>>>>>> happening
>>>>>>>
>>>>>>> There's no other activity on the couchdb instance other than what
>>>>>>> described in this email.
>>>>>>>
>>>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>>>                                                 [{"Server",
>>>>>>>
>>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>>>                                                  {"Date",
>>>>>>>                                                   "Fri, 14 Oct 2011
>>>>>>> 01:46:37 GMT"},
>>>>>>>                                                  {"Content-Type",
>>>>>>>                                                   "text/plain;
>>>>>>> charset=utf-8"},
>>>>>>>
>>>>>>>  {"Content-Length","350"},
>>>>>>>                                                  {"Cache-Control",
>>>>>>>                                                   "must-revalidate"}],
>>>>>>>
>>>>>>>
>>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Paolo
>>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Engineering
>>> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
>>>
>>> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
>>> Sitz der Gesellschaft: Berlin; HRB 117846 B
>>> Registergericht Berlin-Charlottenburg
>>> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
>>
>



-- 
Engineering
http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064

wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Re: timeout hitting a database url after launching compaction

Posted by Robert Newson <rn...@apache.org>.

Compaction is an online process, there should be no expectation of 500
responses before, during, or after compaction.

In this case, it seems the couch_server process is blocked for more
than five seconds performing I/O and the gen_server:call from
couch_server:open times out. This timeout has been increased to
infinity since 1.0.0.

What version are you running?

B.

On 17 October 2011 12:05, Martin Hewitt <ma...@thenoi.se> wrote:
> I disagree, it makes sense as the 5xx error code range is for responses where the server can't fulfil a well-formed, valid client request.
>
> Your GET is well-formed, but the server can't process it as it's working on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate, but the 5xx prefix is certainly correct.
>
> Martin
>
> Sent from my iPhone
>
> On 17 Oct 2011, at 09:29, Paolo Negri <pa...@wooga.net> wrote:
>
>> I agree on the fact that what happens is pretty clear to explain, I
>> still thought it would be useful for the developers to know since
>> offering a 500 status code for a known system condition is probably
>> something that can be improved.
>>
>> Thanks,
>>
>> Paolo
>>
>> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
>>> I am not developer, but it's quite logic, I may say. Once you started the
>>> compaction, your CouchDB is not responsive while the database is preparing
>>> for compaction. Triggering immediately GET, the web instance responds with
>>> status code 500 (internal server error, meaning unresponsive server in this
>>> case). So, nothing unusual in my opinion.
>>>
>>> Cheers,
>>> CGS
>>>
>>>
>>>
>>>
>>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>>>
>>>> IO activity is not monitored, there's only one db on the couchdb
>>>> instance and the described job is the only activity executed on this
>>>> machine.
>>>> Delaying the first request on the database url by 30 seconds did
>>>> actually prevent the problem from happening again.
>>>> So the issue seems to happen specifically at the moment right after
>>>> compaction is started.
>>>> The database is about 7GB big once compressed, the server is hosted on
>>>> ec2 with the database directory placed on his own dedicated ephemeral
>>>> storage.
>>>>
>>>> Thanks,
>>>>
>>>> Paolo
>>>>
>>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>>  wrote:
>>>>>
>>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>>> this. I've seen some compactions wallop a system when it switches over
>>>>> due to removing large old files and such. It doesn't sound like this
>>>>> is big enough for that case but it might be something worth checking.
>>>>>
>>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>>  wrote:
>>>>>>
>>>>>> Dear list,
>>>>>>
>>>>>> We have a script that does the following (strictly sequentially)
>>>>>>
>>>>>> 1) update 300K docs in a db
>>>>>> 2) launch compaction of the db
>>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>>> when compaction completed
>>>>>>
>>>>>> Last night we got a timeout error during 3, we think that this might
>>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>>> done right after triggering compaction
>>>>>>
>>>>>> I thought the dev team might be interested in knowing that this is
>>>>>> happening
>>>>>>
>>>>>> There's no other activity on the couchdb instance other than what
>>>>>> described in this email.
>>>>>>
>>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>>                                                 [{"Server",
>>>>>>
>>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>>                                                  {"Date",
>>>>>>                                                   "Fri, 14 Oct 2011
>>>>>> 01:46:37 GMT"},
>>>>>>                                                  {"Content-Type",
>>>>>>                                                   "text/plain;
>>>>>> charset=utf-8"},
>>>>>>
>>>>>>  {"Content-Length","350"},
>>>>>>                                                  {"Cache-Control",
>>>>>>                                                   "must-revalidate"}],
>>>>>>
>>>>>>
>>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Paolo
>>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Engineering
>> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
>>
>> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
>> Sitz der Gesellschaft: Berlin; HRB 117846 B
>> Registergericht Berlin-Charlottenburg
>> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser
>

Re: timeout hitting a database url after launching compaction

Posted by Martin Hewitt <ma...@thenoi.se>.

I disagree, it makes sense as the 5xx error code range is for responses where the server can't fulfil a well-formed, valid client request. 

Your GET is well-formed, but the server can't process it as it's working on the previous action, so a 500 is perfectly valid. Perhaps a 503 would be more accurate, but the 5xx prefix is certainly correct. 

Martin

Sent from my iPhone

On 17 Oct 2011, at 09:29, Paolo Negri <pa...@wooga.net> wrote:

> I agree on the fact that what happens is pretty clear to explain, I
> still thought it would be useful for the developers to know since
> offering a 500 status code for a known system condition is probably
> something that can be improved.
> 
> Thanks,
> 
> Paolo
> 
> On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
>> I am not developer, but it's quite logic, I may say. Once you started the
>> compaction, your CouchDB is not responsive while the database is preparing
>> for compaction. Triggering immediately GET, the web instance responds with
>> status code 500 (internal server error, meaning unresponsive server in this
>> case). So, nothing unusual in my opinion.
>> 
>> Cheers,
>> CGS
>> 
>> 
>> 
>> 
>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>> 
>>> IO activity is not monitored, there's only one db on the couchdb
>>> instance and the described job is the only activity executed on this
>>> machine.
>>> Delaying the first request on the database url by 30 seconds did
>>> actually prevent the problem from happening again.
>>> So the issue seems to happen specifically at the moment right after
>>> compaction is started.
>>> The database is about 7GB big once compressed, the server is hosted on
>>> ec2 with the database directory placed on his own dedicated ephemeral
>>> storage.
>>> 
>>> Thanks,
>>> 
>>> Paolo
>>> 
>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>  wrote:
>>>> 
>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>> this. I've seen some compactions wallop a system when it switches over
>>>> due to removing large old files and such. It doesn't sound like this
>>>> is big enough for that case but it might be something worth checking.
>>>> 
>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>  wrote:
>>>>> 
>>>>> Dear list,
>>>>> 
>>>>> We have a script that does the following (strictly sequentially)
>>>>> 
>>>>> 1) update 300K docs in a db
>>>>> 2) launch compaction of the db
>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>> when compaction completed
>>>>> 
>>>>> Last night we got a timeout error during 3, we think that this might
>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>> done right after triggering compaction
>>>>> 
>>>>> I thought the dev team might be interested in knowing that this is
>>>>> happening
>>>>> 
>>>>> There's no other activity on the couchdb instance other than what
>>>>> described in this email.
>>>>> 
>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>                                                 [{"Server",
>>>>> 
>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>                                                  {"Date",
>>>>>                                                   "Fri, 14 Oct 2011
>>>>> 01:46:37 GMT"},
>>>>>                                                  {"Content-Type",
>>>>>                                                   "text/plain;
>>>>> charset=utf-8"},
>>>>> 
>>>>>  {"Content-Length","350"},
>>>>>                                                  {"Cache-Control",
>>>>>                                                   "must-revalidate"}],
>>>>> 
>>>>> 
>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Paolo
>>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Engineering
> http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064
> 
> wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
> Sitz der Gesellschaft: Berlin; HRB 117846 B
> Registergericht Berlin-Charlottenburg
> Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Re: timeout hitting a database url after launching compaction

Posted by CGS <cg...@gmail.com>.

I don't suppose that's the developers' choice. That's more like W3C 
choice when they defined the standard codes. In your case, the code is 
returned by cURL (or whatever application you use for triggering the 
predicate GET). I am not so sure the developers can do anything about 
that because the database file on the disk is replaced with the new 
database file in which the history is no longer kept (if I understood 
well the compaction process). The only thing the developers may be able 
to do is to offer you a 404 code (file not found) during that time. But 
I suppose that's more alarming then 500. :)

Nevertheless, that's their choice and it's good that you provided this 
piece of information for the rest of us to know about it and to know how 
to handle these situations.

Cheers,
CGS



On 10/17/2011 10:29 AM, Paolo Negri wrote:
> I agree on the fact that what happens is pretty clear to explain, I
> still thought it would be useful for the developers to know since
> offering a 500 status code for a known system condition is probably
> something that can be improved.
>
> Thanks,
>
> Paolo
>
> On Mon, Oct 17, 2011 at 10:24 AM, CGS<cg...@gmail.com>  wrote:
>> I am not developer, but it's quite logic, I may say. Once you started the
>> compaction, your CouchDB is not responsive while the database is preparing
>> for compaction. Triggering immediately GET, the web instance responds with
>> status code 500 (internal server error, meaning unresponsive server in this
>> case). So, nothing unusual in my opinion.
>>
>> Cheers,
>> CGS
>>
>>
>>
>>
>> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>> IO activity is not monitored, there's only one db on the couchdb
>>> instance and the described job is the only activity executed on this
>>> machine.
>>> Delaying the first request on the database url by 30 seconds did
>>> actually prevent the problem from happening again.
>>> So the issue seems to happen specifically at the moment right after
>>> compaction is started.
>>> The database is about 7GB big once compressed, the server is hosted on
>>> ec2 with the database directory placed on his own dedicated ephemeral
>>> storage.
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>>   wrote:
>>>> Do you monitor IO activity or system responsiveness when you're doing
>>>> this. I've seen some compactions wallop a system when it switches over
>>>> due to removing large old files and such. It doesn't sound like this
>>>> is big enough for that case but it might be something worth checking.
>>>>
>>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>>   wrote:
>>>>> Dear list,
>>>>>
>>>>> We have a script that does the following (strictly sequentially)
>>>>>
>>>>> 1) update 300K docs in a db
>>>>> 2) launch compaction of the db
>>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>>> when compaction completed
>>>>>
>>>>> Last night we got a timeout error during 3, we think that this might
>>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>>> done right after triggering compaction
>>>>>
>>>>> I thought the dev team might be interested in knowing that this is
>>>>> happening
>>>>>
>>>>> There's no other activity on the couchdb instance other than what
>>>>> described in this email.
>>>>>
>>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>>                                                  [{"Server",
>>>>>
>>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>>                                                   {"Date",
>>>>>                                                    "Fri, 14 Oct 2011
>>>>> 01:46:37 GMT"},
>>>>>                                                   {"Content-Type",
>>>>>                                                    "text/plain;
>>>>> charset=utf-8"},
>>>>>
>>>>>   {"Content-Length","350"},
>>>>>                                                   {"Cache-Control",
>>>>>                                                    "must-revalidate"}],
>>>>>
>>>>>
>>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>>    [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Paolo
>>>>>
>>>
>>
>
>

Re: timeout hitting a database url after launching compaction

Posted by Paolo Negri <pa...@wooga.net>.

I agree on the fact that what happens is pretty clear to explain, I
still thought it would be useful for the developers to know since
offering a 500 status code for a known system condition is probably
something that can be improved.

Thanks,

Paolo

On Mon, Oct 17, 2011 at 10:24 AM, CGS <cg...@gmail.com> wrote:
> I am not developer, but it's quite logic, I may say. Once you started the
> compaction, your CouchDB is not responsive while the database is preparing
> for compaction. Triggering immediately GET, the web instance responds with
> status code 500 (internal server error, meaning unresponsive server in this
> case). So, nothing unusual in my opinion.
>
> Cheers,
> CGS
>
>
>
>
> On 10/17/2011 09:57 AM, Paolo Negri wrote:
>>
>> IO activity is not monitored, there's only one db on the couchdb
>> instance and the described job is the only activity executed on this
>> machine.
>> Delaying the first request on the database url by 30 seconds did
>> actually prevent the problem from happening again.
>> So the issue seems to happen specifically at the moment right after
>> compaction is started.
>> The database is about 7GB big once compressed, the server is hosted on
>> ec2 with the database directory placed on his own dedicated ephemeral
>> storage.
>>
>> Thanks,
>>
>> Paolo
>>
>> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>
>>  wrote:
>>>
>>> Do you monitor IO activity or system responsiveness when you're doing
>>> this. I've seen some compactions wallop a system when it switches over
>>> due to removing large old files and such. It doesn't sound like this
>>> is big enough for that case but it might be something worth checking.
>>>
>>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>
>>>  wrote:
>>>>
>>>> Dear list,
>>>>
>>>> We have a script that does the following (strictly sequentially)
>>>>
>>>> 1) update 300K docs in a db
>>>> 2) launch compaction of the db
>>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>>> when compaction completed
>>>>
>>>> Last night we got a timeout error during 3, we think that this might
>>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>>> done right after triggering compaction
>>>>
>>>> I thought the dev team might be interested in knowing that this is
>>>> happening
>>>>
>>>> There's no other activity on the couchdb instance other than what
>>>> described in this email.
>>>>
>>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>>                                                 [{"Server",
>>>>
>>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>>                                                  {"Date",
>>>>                                                   "Fri, 14 Oct 2011
>>>> 01:46:37 GMT"},
>>>>                                                  {"Content-Type",
>>>>                                                   "text/plain;
>>>> charset=utf-8"},
>>>>
>>>>  {"Content-Length","350"},
>>>>                                                  {"Cache-Control",
>>>>                                                   "must-revalidate"}],
>>>>
>>>>
>>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Paolo
>>>>
>>
>>
>
>



-- 
Engineering
http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064

wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Re: timeout hitting a database url after launching compaction

Posted by CGS <cg...@gmail.com>.

I am not developer, but it's quite logic, I may say. Once you started 
the compaction, your CouchDB is not responsive while the database is 
preparing for compaction. Triggering immediately GET, the web instance 
responds with status code 500 (internal server error, meaning 
unresponsive server in this case). So, nothing unusual in my opinion.

Cheers,
CGS




On 10/17/2011 09:57 AM, Paolo Negri wrote:
> IO activity is not monitored, there's only one db on the couchdb
> instance and the described job is the only activity executed on this
> machine.
> Delaying the first request on the database url by 30 seconds did
> actually prevent the problem from happening again.
> So the issue seems to happen specifically at the moment right after
> compaction is started.
> The database is about 7GB big once compressed, the server is hosted on
> ec2 with the database directory placed on his own dedicated ephemeral
> storage.
>
> Thanks,
>
> Paolo
>
> On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis<pa...@gmail.com>  wrote:
>> Do you monitor IO activity or system responsiveness when you're doing
>> this. I've seen some compactions wallop a system when it switches over
>> due to removing large old files and such. It doesn't sound like this
>> is big enough for that case but it might be something worth checking.
>>
>> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri<pa...@wooga.net>  wrote:
>>> Dear list,
>>>
>>> We have a script that does the following (strictly sequentially)
>>>
>>> 1) update 300K docs in a db
>>> 2) launch compaction of the db
>>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>>> when compaction completed
>>>
>>> Last night we got a timeout error during 3, we think that this might
>>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>>> done right after triggering compaction
>>>
>>> I thought the dev team might be interested in knowing that this is happening
>>>
>>> There's no other activity on the couchdb instance other than what
>>> described in this email.
>>>
>>> ERROR unexpectd response checking compaction db: {ok,"500",
>>>                                                  [{"Server",
>>>
>>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>>                                                   {"Date",
>>>                                                    "Fri, 14 Oct 2011
>>> 01:46:37 GMT"},
>>>                                                   {"Content-Type",
>>>                                                    "text/plain; charset=utf-8"},
>>>                                                   {"Content-Length","350"},
>>>                                                   {"Cache-Control",
>>>                                                    "must-revalidate"}],
>>>
>>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>>    [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>>> [{user_ctx,\\n              {user_ctx,null,\\n
>>> [<<\\\"_admin\\\">>],\\n<<\\\"{couch_httpd_auth,
>>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>>
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>
>

Re: timeout hitting a database url after launching compaction

Posted by Paolo Negri <pa...@wooga.net>.

IO activity is not monitored, there's only one db on the couchdb
instance and the described job is the only activity executed on this
machine.
Delaying the first request on the database url by 30 seconds did
actually prevent the problem from happening again.
So the issue seems to happen specifically at the moment right after
compaction is started.
The database is about 7GB big once compressed, the server is hosted on
ec2 with the database directory placed on his own dedicated ephemeral
storage.

Thanks,

Paolo

On Fri, Oct 14, 2011 at 9:05 PM, Paul Davis <pa...@gmail.com> wrote:
> Do you monitor IO activity or system responsiveness when you're doing
> this. I've seen some compactions wallop a system when it switches over
> due to removing large old files and such. It doesn't sound like this
> is big enough for that case but it might be something worth checking.
>
> On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri <pa...@wooga.net> wrote:
>> Dear list,
>>
>> We have a script that does the following (strictly sequentially)
>>
>> 1) update 300K docs in a db
>> 2) launch compaction of the db
>> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
>> when compaction completed
>>
>> Last night we got a timeout error during 3, we think that this might
>> be because the first polling (GET  http://127.0.0.1:5984/database) is
>> done right after triggering compaction
>>
>> I thought the dev team might be interested in knowing that this is happening
>>
>> There's no other activity on the couchdb instance other than what
>> described in this email.
>>
>> ERROR unexpectd response checking compaction db: {ok,"500",
>>                                                 [{"Server",
>>
>> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>>                                                  {"Date",
>>                                                   "Fri, 14 Oct 2011
>> 01:46:37 GMT"},
>>                                                  {"Content-Type",
>>                                                   "text/plain; charset=utf-8"},
>>                                                  {"Content-Length","350"},
>>                                                  {"Cache-Control",
>>                                                   "must-revalidate"}],
>>
>> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
>> [{user_ctx,\\n              {user_ctx,null,\\n
>> [<<\\\"_admin\\\">>],\\n                  <<\\\"{couch_httpd_auth,
>> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>>
>>
>> Thanks,
>>
>> Paolo
>>
>



-- 
Engineering
http://www.wooga.com | phone +49-30-8962 5058  | fax +49-30-8964 9064

wooga GmbH | Saarbruecker Str. 38 | 10405 Berlin | Germany
Sitz der Gesellschaft: Berlin; HRB 117846 B
Registergericht Berlin-Charlottenburg
Geschaeftsfuehrung: Jens Begemann, Philipp Moeser

Re: timeout hitting a database url after launching compaction

Posted by Paul Davis <pa...@gmail.com>.

Do you monitor IO activity or system responsiveness when you're doing
this. I've seen some compactions wallop a system when it switches over
due to removing large old files and such. It doesn't sound like this
is big enough for that case but it might be something worth checking.

On Fri, Oct 14, 2011 at 3:41 AM, Paolo Negri <pa...@wooga.net> wrote:
> Dear list,
>
> We have a script that does the following (strictly sequentially)
>
> 1) update 300K docs in a db
> 2) launch compaction of the db
> 3) poll at a 30 sec frequency http://127.0.0.1:5984/database to know
> when compaction completed
>
> Last night we got a timeout error during 3, we think that this might
> be because the first polling (GET  http://127.0.0.1:5984/database) is
> done right after triggering compaction
>
> I thought the dev team might be interested in knowing that this is happening
>
> There's no other activity on the couchdb instance other than what
> described in this email.
>
> ERROR unexpectd response checking compaction db: {ok,"500",
>                                                 [{"Server",
>
> "CouchDB/1.3.0a-74613f5-git (Erlang OTP/R14B04)"},
>                                                  {"Date",
>                                                   "Fri, 14 Oct 2011
> 01:46:37 GMT"},
>                                                  {"Content-Type",
>                                                   "text/plain; charset=utf-8"},
>                                                  {"Content-Length","350"},
>                                                  {"Cache-Control",
>                                                   "must-revalidate"}],
>
> <<"{\"error\":\"{timeout,{gen_server,call,[<0.21934.9>,{open_ref_count,<0.4090.13>}]}}\",\"reason\":\"{gen_server,call,\\n
>   [couch_server,\\n     {open,<<\\\"backup\\\">>,\\n
> [{user_ctx,\\n              {user_ctx,null,\\n
> [<<\\\"_admin\\\">>],\\n                  <<\\\"{couch_httpd_auth,
> default_authentication_handler}\\\">>}}]},\\n     infinity]}\"}\n">>}
>
>
> Thanks,
>
> Paolo
>