You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Dave Marion (JIRA)" <ji...@apache.org> on 2015/12/23 17:01:46 UTC

[jira] [Commented] (ACCUMULO-4090) BatchWriter close not cleaning up all resources

    [ https://issues.apache.org/jira/browse/ACCUMULO-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069780#comment-15069780 ] 

Dave Marion commented on ACCUMULO-4090:
---------------------------------------

Looking at a heap dump I consistently see two objects in the queue for the jtimer object, a FailedMutations object and an anonymous timer task. I believe the following should be done:

 1. When TSBW.close() is called, then FailedMutations.cancel() should be called.
 2. A reference should be kept to the TimerTask added to jtimer in the TSBW constructor. Then in TSBW.close() the cancel() method should be called on this task.

Looking at the TabletServerBatchWriter objects in the heap dump I see that the closed field is always false. I wonder if the root cause is that this field is not marked as volatile (and the flushing field may be an issue too).

> BatchWriter close not cleaning up all resources
> -----------------------------------------------
>
>                 Key: ACCUMULO-4090
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4090
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.7.0
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>
> I'm debugging an issue with a long-running ingestor, similar to the TraceServer.
> After realizing that BatchWriter close needs to be called when a MutationsRejectedException occurs (see ACCUMULO-4088), a close was added, and the client became more stable.
> However, after a day, or so, the client became sluggish. When inspecting a heap dump, many TabletServerBatchWriter objects were still referenced.  This server should only have two BatchWriter instances at any one time, and this server had >100.
> Still debugging.
> The error that initiates the issue is a SessionID not found, presumably because the session timed out.  This is the cause of the MutationsRejectedException seen by the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)