You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Lou DeGenaro (JIRA)" <de...@uima.apache.org> on 2015/11/02 20:25:27 UTC
[jira] [Commented] (UIMA-4684) DUCC daemons log-to-file should
never give up
[ https://issues.apache.org/jira/browse/UIMA-4684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985835#comment-14985835 ]
Lou DeGenaro commented on UIMA-4684:
------------------------------------
Shown during fix testing, here's an RM log file snippet where directory is over quota. Notice the gap between 14:13:37 and 14:14:57. The RM should be logging every 10 seconds. During this time the file system exceeded quota.
<<<<<>>>>>
02 Nov 2015 14:13:27,908 INFO RM.Scheduler- N/A schedule ------------------------------------------------
02 Nov 2015 14:13:27,908 INFO RM.JobManagerConverter- N/A createState Schedule sent to Orchestrator
02 Nov 2015 14:13:27,909 INFO RM.JobManagerConverter- N/A createState
Reservation 2 15GB
Existing[1]: bluejws67-1.1^0
Additions[0]:
Removals[0]:
02 Nov 2015 14:13:27,917 INFO RM.ResourceManagerComponent- N/A runScheduler -------- 2 ------- Scheduling loop returns --------------------
02 Nov 2015 14:13:28,457 INFO RM.ResourceManagerComponent- N/A NodeStability Initial node stability reached: scheduler started.
02 Nov 2015 14:13:37,903 INFO RM.ResourceManagerComponent- N/A onJobManagerStateUpdate -------> OR state arrives
02 Nov 2015 14:13:37,903 INFO RM.ResourceManagerComponent- N/A runScheduler -------- 3 ------- Entering scheduling loop --------------------
02 Nov 2015 14:13:37,903 INFO RM.Scheduler- N/A nodeArrives Total arrivals: 13
02 Nov 2015 14:13:37,904 INFO RM.NodePool- N/A reset Nodepool: --default-- Maxorder set to 2
02 Nov 2015 14:13:37,904 INFO RM.Scheduler- N/A schedule Scheduling 0 new jobs. Existing jobs: 1
02 Nov 2015 14:13:37,904 INFO RM.Scheduler- N/A schedule Run scheduler 0 with top-level nodepool --default--
02 Nov 2015 14:13:37,904 INFO RM.RmJob- 2 getPrjCap System Cannot predict cap: init_wait false || time_per_item 0.0
02 Nov 2015 14:13:37,904 INFO RM.RmJob- 2 initJobCap System O 1 Base cap: 1 Expected future cap: 2147483647 potential cap 1 actual cap 1
02 Nov 2015 14:13:37,904 INFO RM.NodepoolScheduler- N/A schedule Machine occupancy before schedule
02 Nov 2015 14:13:37,905 INFO RM.NodePool- N/A queryMachines ================================== Query Machines Nodepool: --default-- =========================
02 Nov 2015 14:13:37,906 INFO RM.NodePool- N/A queryMachines
Name Blacklisted Order Active Shares Unused Shares Memory (MB) Jobs
-------------------- ------------ ----- ------------- ------------- ----------- ------ ...
bluejws67-4 false 2 0 2 30720 <none>[2]
bluejws67-3 false 2 0 2 30720 <none>[2]
bluejws67-1 false 1 1 0 15360 2
bluejws67-2 false 1 0 1 15360 <none>[1]
02 Nov 2015 14:13:37,906 INFO RM.NodePool- N/A queryMachines ================================== End Query Machines Nodepool: --default-- ======================
02 Nov 2015 14:13:37,906 INFO RM.NodePool- N/A reset Nodepool: --d02 Nov 2015 14:14:57,862 INFO RM.ResourceManagerComponent- N/A runScheduler -------- 11 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:57,863 INFO RM.Scheduler- N/A nodeArrives Total arrivals: 45
02 Nov 2015 14:14:57,863 INFO RM.NodePool- N/A reset Nodepool: --default-- Maxorder set to 2
02 Nov 2015 14:14:57,863 INFO RM.Scheduler- N/A schedule Scheduling 0 new jobs. Existing jobs: 1
<<<<< >>>>>
Here is the corresponding RM console. Notice the console was still being written during the time the file system quota was exceeded.
<<<<<>>>>>
02 Nov 2015 14:14:07,903 INFO RM.ResourceManagerComponent - J[N/A] T[48] runScheduler -------- 6 ------- Scheduling loop returns --------------------
02 Nov 2015 14:14:17,848 INFO RM.ResourceManagerEventListener - J[N/A] T[28] onOrchestratorStateUpdateEvent Event arrives
02 Nov 2015 14:14:17,885 INFO RM.ResourceManagerComponent - J[N/A] T[28] onJobManagerStateUpdate -------> OR state arrives
java.io.IOException: Disk quota exceeded
at java.io.FileOutputStream.write(FileOutputStream.java:329)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:233)
...
Unable to log due to logging exception.
02 Nov 2015 14:14:17,891 INFO RM.ResourceManagerComponent - J[N/A] T[48] runScheduler -------- 7 ------- Entering scheduling loop --------------------
02 Nov 2015 14:14:17,892 INFO RM.Scheduler - J[N/A] T[48] nodeArrives Total arrivals: 29
<<<<<>>>>>
> DUCC daemons log-to-file should never give up
> ---------------------------------------------
>
> Key: UIMA-4684
> URL: https://issues.apache.org/jira/browse/UIMA-4684
> Project: UIMA
> Issue Type: Bug
> Components: DUCC
> Reporter: Lou DeGenaro
> Assignee: Lou DeGenaro
> Fix For: 2.1.0-Ducc
>
>
> Problem: When the common logging code fails to log to file, for example due to a quota violation, it sets a flag to never try logging again. The only way to resume logging is to recycle the daemon.
> Resolution: The logger should always attempt to log to file..never give up hope!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)