You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Lou DeGenaro (JIRA)" <de...@uima.apache.org> on 2014/02/27 21:54:19 UTC
[jira] [Commented] (UIMA-3657) DUCC Orchestrator (OR) improved synchronization tracking

    [ https://issues.apache.org/jira/browse/UIMA-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915034#comment-13915034 ] 

Lou DeGenaro commented on UIMA-3657:
------------------------------------

Sample "blocked" report:

27 Feb 2014 15:18:56,098  INFO OR.TrackSync - blocked     N/A target: DuccWorkMap requester: OrchestratorCommonArea.getCheckpointable time: 16001 blocking: OrchestratorComponent.getState

The above or.log entry says that OrchestratorComponent.getState is blocked by OrchestratorCommonArea.getCheckpointable who has held the DuccWorkMap synchronization lock for about 16 seconds.

=====

Sample "overtime" report:

27 Feb 2014 15:18:56,098  INFO OR.TrackSync - overtime     N/A target: DuccWorkMap requester: OrchestratorCommonArea.getCheckpointable wait: 1 held: 21877
27 Feb 2014 15:18:56,098  INFO OR.TrackSync - report     N/A target: DuccWorkMap requester: OrchestratorComponent.getState  pending: 1

The above or.log entries say that OrchestratorCommonArea.getCheckpointable waited on 1 millisecond to obtain the synchronized lock for DuccWorkMap, then held the lock for nearly 22 seconds and in doing so blocked 1 instance of OrchestratorComponent.getState from getting the lock


> DUCC Orchestrator (OR) improved synchronization tracking
> --------------------------------------------------------
>
>                 Key: UIMA-3657
>                 URL: https://issues.apache.org/jira/browse/UIMA-3657
>             Project: UIMA
>          Issue Type: Improvement
>          Components: DUCC
>    Affects Versions: 1.0-Ducc
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>
> The orchestrator currently records to its log some limited and incomplete information about synchronization.  This improvement:
> 1. Instruments all WorkMap synchronizations in the OR
> 2. Accounts for time blocked and time held
> 3. Records all new requests for synchronization when current holder exceeds 10 seconds
> 4. Records all pending requests when current holder releases having held synchronization for > 10 seconds
> This is to address the situation, for example, where OR is running albeit slowly.  Newly added log messages will hopefully shed light on where the bottlenecks may be.
> One theory is that a normally fast resource, such as the filesystem, becomes very slow and bogs down OR while its trying to write its checkpoint dataset.  In this case, we'd expect to see the synchronization lock held for a long time by the OR's checkpoint module.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)