You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Randall Leeds (JIRA)" <ji...@apache.org> on 2010/06/03 02:31:55 UTC
[jira] Commented: (COUCHDB-761) Timeouts in couch_log are masked,
crashes callers
[ https://issues.apache.org/jira/browse/COUCHDB-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874885#action_12874885 ]
Randall Leeds commented on COUCHDB-761:
---------------------------------------
Had this in production for about a week now on a bunch of servers. Seems to fix the timeout problem mentioned above. I'd appreciate if some brave soul would apply this patch and give their server a beating to be sure everything looks okay. Some of the servers where this is running here are under crazy load and nothing seems broken, but I like second opinions, especially on things that could have broad performance implications under stress.
> Timeouts in couch_log are masked, crashes callers
> -------------------------------------------------
>
> Key: COUCHDB-761
> URL: https://issues.apache.org/jira/browse/COUCHDB-761
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 0.10.1, 0.10.2, 0.11
> Reporter: Randall Leeds
> Priority: Blocker
> Fix For: 0.10.3, 0.11.1, 1.0
>
> Attachments: improved-sync-logging-v2.patch, improved-sync-logging.patch
>
>
> Several users have reported seeing crash reports stemming from a function_clause match on handle_info in various gen_servers. The offending message looks like {#Ref<>, <integer>}.
> After months of banter and sleuthing, I determined that the likely cause was a late reply to a gen_server:call that timed out, with the #Ref being the tag on the response. After it came up again today in IRC, kocolosk quickly discovered that the problem appears to be in couch_log.erl.
> The logging macros (?LOG_*) call couch_log/*_on which calls get_level_integer/0. When this call times out the timeout is eaten and a late reply arrives to the calling process later, triggering the crash.
> Suggestions on how to fix this welcome. Ideas so far are async logging or infinite timeout.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.