You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Eddie Epstein (JIRA)" <de...@uima.apache.org> on 2017/10/20 15:27:00 UTC

[jira] [Commented] (UIMA-5605) DUCC scheduler ArrayIndexOutOfBoundsException

    [ https://issues.apache.org/jira/browse/UIMA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212784#comment-16212784 ] 

Eddie Epstein commented on UIMA-5605:
-------------------------------------

This bug is exposed in situations where there is active work on a node and then "usable" memory shrinks. Usable memory is defined as machine memory size minus RAM used by "system" processes. System processes are all those owned by UID < ducc.agent.rogue.process.sys.uid.max, which has a default value of 500. The data above suggests that usable memory had shrunk by 30GB on machine xxx.xxx.xx.

The motivation for this design is to be able to guarantee user processes the requested amount of RAM. DUCC currently does not stop running processes when usable memory shrinks, but does not want to deploy new processes that will not fit in free RAM remaining. The exception here occurred when looking to pre-empt existing processes for a new, higher priority process.

The problem is reproduced by running a preemptable job process on a node, having user root grab enough RAM to reduce the [quantized] usable memory of the node, and then trying to allocate a non-preempable process on the node.


> DUCC scheduler ArrayIndexOutOfBoundsException
> ---------------------------------------------
>
>                 Key: UIMA-5605
>                 URL: https://issues.apache.org/jira/browse/UIMA-5605
>             Project: UIMA
>          Issue Type: Bug
>          Components: DUCC
>    Affects Versions: 2.2.1-Ducc
>            Reporter: Jörg W
>             Fix For: future-DUCC
>
>
> The scheduler stops scheduling and ducc-mon indicates inresposive ResourceManager.
> rm.log (Trace):
> 05 Okt 2017 11:29:13,336 TRACE RM.NodepoolScheduler- N/A detectFragmentation  Freed shares 246 on machine xxx.xxx.xx
> 05 Okt 2017 11:29:13,336 TRACE RM.NodepoolScheduler- N/A detectFragmentation  Update v before: NP[ --default-- ] v:   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 0   0   0   0   0   0   0  
> 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
> 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
> 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
> 05 Okt 2017 11:29:13,336 FATAL RM.ResourceManagerComponent- N/A runScheduler 
> java.lang.ArrayIndexOutOfBoundsException
> An ArrayIndexOutOfBoundsException can occur in the NodepoolScheduler class at line 2422:
> vmach_j[free]++;  
> Quickfix: comment it out. It seems only be used for logging (trace).
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)