You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Lou DeGenaro (JIRA)" <de...@uima.apache.org> on 2015/11/02 20:25:27 UTC
[jira] [Commented] (UIMA-4684) DUCC daemons log-to-file should never give up

    [ https://issues.apache.org/jira/browse/UIMA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985835#comment-14985835 ] 

Lou DeGenaro commented on UIMA-4684:
------------------------------------

Shown during fix testing, here's an RM log file snippet where directory is over quota.  Notice the gap between 14:13:37 and 14:14:57.  The RM should be logging every 10 seconds.  During this time the file system exceeded quota.

<<<<<>>>>>
02 Nov 2015 14:13:27,908  INFO RM.Scheduler- N/A schedule  ------------------------------------------------
02 Nov 2015 14:13:27,908  INFO RM.JobManagerConverter- N/A createState  Schedule sent to Orchestrator
02 Nov 2015 14:13:27,909  INFO RM.JobManagerConverter- N/A createState
Reservation 2 15GB
        Existing[1]: bluejws67-1.1^0
        Additions[0]:
        Removals[0]:

02 Nov 2015 14:13:27,917  INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 2 ------- Scheduling loop returns  --------------------
02 Nov 2015 14:13:28,457  INFO RM.ResourceManagerComponent- N/A NodeStability  Initial node stability reached: scheduler started.
02 Nov 2015 14:13:37,903  INFO RM.ResourceManagerComponent- N/A onJobManagerStateUpdate  -------> OR state arrives
02 Nov 2015 14:13:37,903  INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 3 ------- Entering scheduling loop --------------------
02 Nov 2015 14:13:37,903  INFO RM.Scheduler- N/A nodeArrives  Total arrivals: 13
02 Nov 2015 14:13:37,904  INFO RM.NodePool- N/A reset  Nodepool: --default-- Maxorder set to 2
02 Nov 2015 14:13:37,904  INFO RM.Scheduler- N/A schedule  Scheduling 0  new jobs.  Existing jobs: 1
02 Nov 2015 14:13:37,904  INFO RM.Scheduler- N/A schedule  Run scheduler 0 with top-level nodepool --default--
02 Nov 2015 14:13:37,904  INFO RM.RmJob- 2 getPrjCap  System Cannot predict cap: init_wait false || time_per_item 0.0
02 Nov 2015 14:13:37,904  INFO RM.RmJob- 2 initJobCap  System O 1 Base cap: 1 Expected future cap: 2147483647 potential cap 1 actual cap 1
02 Nov 2015 14:13:37,904  INFO RM.NodepoolScheduler- N/A schedule  Machine occupancy before schedule
02 Nov 2015 14:13:37,905  INFO RM.NodePool- N/A queryMachines  ================================== Query Machines Nodepool: --default-- =========================
02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A queryMachines
                 Name  Blacklisted Order Active Shares Unused Shares Memory (MB) Jobs
-------------------- ------------ ----- ------------- ------------- ----------- ------ ...
         bluejws67-4        false     2             0             2       30720 <none>[2]
         bluejws67-3        false     2             0             2       30720 <none>[2]
         bluejws67-1        false     1             1             0       15360 2
         bluejws67-2        false     1             0             1       15360 <none>[1]

02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A queryMachines  ================================== End Query Machines Nodepool: --default-- ======================
02 Nov 2015 14:13:37,906  INFO RM.NodePool- N/A reset  Nodepool: --d02 Nov 2015 14:14:57,862  INFO RM.ResourceManagerComponent- N/A runScheduler  -------- 11 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:57,863  INFO RM.Scheduler- N/A nodeArrives  Total arrivals: 45
02 Nov 2015 14:14:57,863  INFO RM.NodePool- N/A reset  Nodepool: --default-- Maxorder set to 2
02 Nov 2015 14:14:57,863  INFO RM.Scheduler- N/A schedule  Scheduling 0  new jobs.  Existing jobs: 1
<<<<< >>>>>

Here is the corresponding RM console.  Notice the console was still being written during the time the file system quota was exceeded.

<<<<<>>>>>
02 Nov 2015 14:14:07,903  INFO RM.ResourceManagerComponent - J[N/A] T[48] runScheduler  -------- 6 ------- Scheduling loop returns  --------------------
02 Nov 2015 14:14:17,848  INFO RM.ResourceManagerEventListener - J[N/A] T[28] onOrchestratorStateUpdateEvent  Event arrives
02 Nov 2015 14:14:17,885  INFO RM.ResourceManagerComponent - J[N/A] T[28] onJobManagerStateUpdate  -------> OR state arrives
java.io.IOException: Disk quota exceeded
        at java.io.FileOutputStream.write(FileOutputStream.java:329)
        at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
       ...
Unable to log due to logging exception.
02 Nov 2015 14:14:17,891  INFO RM.ResourceManagerComponent - J[N/A] T[48] runScheduler  -------- 7 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:17,892  INFO RM.Scheduler - J[N/A] T[48] nodeArrives  Total arrivals: 29
<<<<<>>>>>

> DUCC daemons log-to-file should never give up
> ---------------------------------------------
>
>                 Key: UIMA-4684
>                 URL: https://issues.apache.org/jira/browse/UIMA-4684
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>            Reporter: Lou DeGenaro
>            Assignee: Lou DeGenaro
>             Fix For: 2.1.0-Ducc
>
>
> Problem: When the common logging code fails to log to file, for example due to a quota violation, it sets a flag to never try logging again.  The only way to resume logging is to recycle the daemon.
> Resolution: The logger should always attempt to log to file..never give up hope!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)