You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Simon Hafner <re...@gmail.com> on 2014/11/12 11:51:26 UTC

DUCC stuck at WaitingForResources on an Amazon Linux

I've set up DUCC according to
https://cwiki.apache.org/confluence/display/UIMA/DUCC

    ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job

the job is stuck at WaitingForResources.

12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
process     N/A ... Agent Collecting User Processes
12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
copyAllUserReservations     N/A +++++++++++ Copying User Reservations
- List Size:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ********** User Process Map Size After
copyAllUserReservations:0
12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ********** User Process Map Size After
copyAllUserRougeProcesses:0
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call     N/A
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
   N/A ******************************************************************************
12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
Low Swap Threshold Defined in ducc.properties:0
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
ID:13
12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
reportIncomingStateForThisNode     N/A
JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
State:Allocated isDeallocated:false
12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
N/A +++++++++++ Copied User Reservations - List Size:0
12 Nov 2014 10:37:33,405  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
 N/A PID:8168 Swap Usage:0
12 Nov 2014 10:37:33,913  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
collectProcessCurrentCPU     N/A 0.0 == CPUTIME:0.0
12 Nov 2014 10:37:33,913  INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process     N/A
----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
Usage Allowed:-108574720 Time to Collect Swap Usage:0

I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
Linux (looks centos based).

To install maven (not in the repos)

#! /bin/bash

TEMPORARY_DIRECTORY="$(mktemp -d)"
DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"

echo 'Downloading Maven to: ' "$DOWNLOAD_TO"

wget -O "$DOWNLOAD_TO"
http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz

echo 'Extracting Maven'
tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
rm $DOWNLOAD_TO

echo 'Configuring Envrionment'

mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
echo -e 'export M2_HOME=/usr/local/maven\nexport
PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
source /etc/profile.d/maven.sh

echo 'The maven version: ' `mvn -version` ' has been installed.'
echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
echo 'Removing the temporary directory...'
rm -r "$TEMPORARY_DIRECTORY"
echo 'Your Maven Installation is Complete.'

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Eddie Epstein <ea...@gmail.com>.
On Fri, Nov 14, 2014 at 8:11 PM, Simon Hafner <re...@gmail.com> wrote:

> So to run effectively, I would need more memory, because the job wants
> two shares? ... Yes. With a larger node it works. What would be a
> reasonable memory size for a ducc node?
>
> Really depends on the application code. Quoting from the DUCC overview at
http://uima.apache.org/doc-uimaducc-whatitam.html

  "DUCC is particularly well suited to run large memory Java analytics in
   multiple threads in order to fully utilize multicore machines."

Our experience to date has been with machines 16-256GB and 4-32 CPU cores.
Smaller machines, 8GB or less, have only been used for development of DUCC
itself, with dummy analytics that use little memory and CPU.

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Simon Hafner <re...@gmail.com>.
So to run effectively, I would need more memory, because the job wants
two shares? ... Yes. With a larger node it works. What would be a
reasonable memory size for a ducc node?

2014-11-14 9:38 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
> Simon,
>
> Congratulations!  You found a bug in DUCC's Web Server.  It was incorrectly
> rounding up when reporting the number of shares for a machine.  This issue
> is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>.
>
> Lou.
>
> On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <ch...@gmail.com> wrote:
>
>> Simon,
>>     It looks like the problem is the amount of RAM on your machine. It's
>> going to be hard to get any meaningful work running on < about 8G.
>>
>>     Here's what to do to get the test job to run on your 4G machine:
>>     1.  In the resources folder, edit ducc.properties and change this:
>>               ducc.jd.host.memory.size=2GB
>>          to this:
>>               ducc.jd.host.memory.size=1GB
>>
>>          This is the amount of RAM that DUCC reserves for itself to manage
>> it's "head" processes.
>>
>>     2.  In the examples/simple folder, edit 1.job and change this:
>>              process_memory_size            2
>>          to this:
>>              process_memory_size            1
>>
>>          This is the amount of memory in GB that the sample 1.job is
>> requesting.
>>
>>      3.  Stop ducc and restart it so the ducc processes reset the
>> jd.host.memory size from the new ducc.properties.
>>
>>      4.  Rerun 1.job and all should be well.
>>
>>       Here are the gory details from the RM log, if you're interested.  In
>> the RM log, I see these lines.
>>
>> 13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
>>                  Name       Order Active Shares Unused Shares Memory (MB)
>> Jobs
>> --------------------        ----- ------------- ------------- -----------
>> ------ ...
>> .us-west-2.compute.internal     3 2             1        3955 7 [1]
>>
>> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
>> the reservation/job "7", and that you have 1GB free.  The reason you have
>> only 3GB **usable** is that usually the hardware/opsys will reserve a small
>> part of the installed RAM for itself, so the reported RAM is a tad
>> smaller.  To avoid overcommitting the system, we use the reported value,
>> not the installed value.  Most or all of the jobs here will easily
>> overwhelm even the largest machines if we don't do this.
>>
>> Next,  these lines show the actual schedule the RM is trying to build.
>> Dormant:
>>             ID                        JobName       User Class Shares
>> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>>    J_________8                     Test_job_1       ducc normal      0
>>  2       0   2      2     15       15 true         8
>>
>> Reserved:
>>             ID                        JobName       User Class Shares
>> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>>    R_________7                     Job_Driver     System JobDriver      1
>>    2       2   0      2      0        0 0         1
>>
>> This confirms that the DUCC reservation "7" occupies 2G, and that job "8"
>> is requesting 2G but is "dormant", i.e. waiting for resources.  Since there
>> is only 3G available on this machine, job 8 will wait.
>>
>> Best,
>> Jim
>>
>>
>>
>>
>>
>>

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Lou DeGenaro <lo...@gmail.com>.
Simon,

Congratulations!  You found a bug in DUCC's Web Server.  It was incorrectly
rounding up when reporting the number of shares for a machine.  This issue
is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>.

Lou.

On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <ch...@gmail.com> wrote:

> Simon,
>     It looks like the problem is the amount of RAM on your machine. It's
> going to be hard to get any meaningful work running on < about 8G.
>
>     Here's what to do to get the test job to run on your 4G machine:
>     1.  In the resources folder, edit ducc.properties and change this:
>               ducc.jd.host.memory.size=2GB
>          to this:
>               ducc.jd.host.memory.size=1GB
>
>          This is the amount of RAM that DUCC reserves for itself to manage
> it's "head" processes.
>
>     2.  In the examples/simple folder, edit 1.job and change this:
>              process_memory_size            2
>          to this:
>              process_memory_size            1
>
>          This is the amount of memory in GB that the sample 1.job is
> requesting.
>
>      3.  Stop ducc and restart it so the ducc processes reset the
> jd.host.memory size from the new ducc.properties.
>
>      4.  Rerun 1.job and all should be well.
>
>       Here are the gory details from the RM log, if you're interested.  In
> the RM log, I see these lines.
>
> 13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
>                  Name       Order Active Shares Unused Shares Memory (MB)
> Jobs
> --------------------        ----- ------------- ------------- -----------
> ------ ...
> .us-west-2.compute.internal     3 2             1        3955 7 [1]
>
> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
> the reservation/job "7", and that you have 1GB free.  The reason you have
> only 3GB **usable** is that usually the hardware/opsys will reserve a small
> part of the installed RAM for itself, so the reported RAM is a tad
> smaller.  To avoid overcommitting the system, we use the reported value,
> not the installed value.  Most or all of the jobs here will easily
> overwhelm even the largest machines if we don't do this.
>
> Next,  these lines show the actual schedule the RM is trying to build.
> Dormant:
>             ID                        JobName       User Class Shares
> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>    J_________8                     Test_job_1       ducc normal      0
>  2       0   2      2     15       15 true         8
>
> Reserved:
>             ID                        JobName       User Class Shares
> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>    R_________7                     Job_Driver     System JobDriver      1
>    2       2   0      2      0        0 0         1
>
> This confirms that the DUCC reservation "7" occupies 2G, and that job "8"
> is requesting 2G but is "dormant", i.e. waiting for resources.  Since there
> is only 3G available on this machine, job 8 will wait.
>
> Best,
> Jim
>
>
>
>
>
>

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Jim Challenger <ch...@gmail.com>.
Simon,
     It looks like the problem is the amount of RAM on your machine. 
It's going to be hard to get any meaningful work running on < about 8G.

     Here's what to do to get the test job to run on your 4G machine:
     1.  In the resources folder, edit ducc.properties and change this:
               ducc.jd.host.memory.size=2GB
          to this:
               ducc.jd.host.memory.size=1GB

          This is the amount of RAM that DUCC reserves for itself to 
manage it's "head" processes.

     2.  In the examples/simple folder, edit 1.job and change this:
              process_memory_size            2
          to this:
              process_memory_size            1

          This is the amount of memory in GB that the sample 1.job is 
requesting.

      3.  Stop ducc and restart it so the ducc processes reset the 
jd.host.memory size from the new ducc.properties.

      4.  Rerun 1.job and all should be well.

       Here are the gory details from the RM log, if you're interested.  
In the RM log, I see these lines.

13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
                  Name       Order Active Shares Unused Shares Memory 
(MB) Jobs
--------------------        ----- ------------- ------------- 
----------- ------ ...
.us-west-2.compute.internal     3 2             1        3955 7 [1]

This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by 
the reservation/job "7", and that you have 1GB free.  The reason you 
have only 3GB **usable** is that usually the hardware/opsys will reserve 
a small part of the installed RAM for itself, so the reported RAM is a 
tad smaller.  To avoid overcommitting the system, we use the reported 
value, not the installed value.  Most or all of the jobs here will 
easily overwhelm even the largest machines if we don't do this.

Next,  these lines show the actual schedule the RM is trying to build.
Dormant:
             ID                        JobName       User Class Shares 
Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
    J_________8                     Test_job_1       ducc normal      
0     2       0   2      2     15       15 true         8

Reserved:
             ID                        JobName       User Class Shares 
Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
    R_________7                     Job_Driver     System JobDriver      
1     2       2   0      2      0        0 0         1

This confirms that the DUCC reservation "7" occupies 2G, and that job 
"8" is requesting 2G but is "dormant", i.e. waiting for resources.  
Since there is only 3G available on this machine, job 8 will wait.

Best,
Jim






Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Simon Hafner <re...@gmail.com>.
I can't find anything on first glance. Maybe the memory?

13 Nov 2014 22:04:14,907  INFO RM.ResourceManagerComponent -
onJobManagerStateUpdate     N/A -------> OR state arrives
13 Nov 2014 22:04:14,908  INFO RM.JobManagerConverter - jobUpdate
 8 tot: 15 WaitingForResources -> WaitingForResources compl: 0 err: 0
rem: 15 mean: NaN
13 Nov 2014 22:04:14,908  INFO RM.ResourceManagerComponent -
runScheduler     N/A -------- 30 ------- Entering scheduling loop
--------------------
13 Nov 2014 22:04:14,908  INFO RM.Scheduler - nodeArrives     N/A
Total arrivals: 9
13 Nov 2014 22:04:14,908  INFO RM.Scheduler - schedule     N/A
Scheduling 0  new jobs.  Existing jobs: 2
13 Nov 2014 22:04:14,908  INFO RM.Scheduler - schedule     N/A Run
scheduler 0 with top-level nodepool --default--
13 Nov 2014 22:04:14,908  INFO RM.RmJob - getPrjCap       8 ducc
Cannot predict cap: init_wait true || time_per_item NaN
13 Nov 2014 22:04:14,908  INFO RM.RmJob - initJobCap       8 ducc O 2
Base cap: 8 Expected future cap: 2147483647 potential cap 8 actual cap
1
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - schedule     N/A
Machine occupancy before schedule
13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
================================== Query Machines Nodepool:
--default-- =========================
13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
                 Name Order Active Shares Unused Shares Memory (MB) Jobs
-------------------- ----- ------------- ------------- ----------- ------ ...
.us-west-2.compute.internal     3             2             1        3955 7 [1]

13 Nov 2014 22:04:14,909  INFO RM.NodePool - queryMachines     N/A
================================== End Query Machines Nodepool:
--default-- ======================
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare     N/A No jobs to schedule in class  fixed
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare     N/A No jobs to schedule in class  debug
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare     N/A Scheduling jobs in class: JobDriver  7
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler -
howMuchFixedShare       7 [stable] requested 1 assigned 1 processes,
2 QS
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class urgent
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  urgent
13 Nov 2014 22:04:14,909  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class high
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  high
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class standalone
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  standalone
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class weekly
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  weekly
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class normal
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
      8 Scheduling job in class  normal : J_________8
   Test_job_1       ducc     normal      0     2       0   2      2
 15       15     true         8
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class low
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  low
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A Schedule class background
13 Nov 2014 22:04:14,910  INFO RM.NodepoolScheduler - howMuchFairShare
    N/A No jobs to schedule in class  background
13 Nov 2014 22:04:14,924 DEBUG RM.NodepoolScheduler - countClassShares
    N/A Counting for nodepool --default--
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares RmCounter Start
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares maxorder =  3
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares entity_names =  normal
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares weights      =  100
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares wantedby.normal =    1   0
 1   0
13 Nov 2014 22:04:14,924  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares vmachines =   0   1   0   0
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares RmCounter End
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares Final apportionment:
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares       normal gbo  0   0   0
  0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares vshares   0   1   0   0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares     N/A countClassShares nshares   0   1   0   0
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class urgent
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  urgent
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class high
13 Nov 2014 22:04:14,925  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  high
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class standalone
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  standalone
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class weekly
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  weekly
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class normal
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
     8 Scheduling job in class  normal : 0 shares given, order 2
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class low
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  low
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A Schedule class background
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler - whatOfFairShare
   N/A No jobs to schedule in class  background
13 Nov 2014 22:04:14,926  INFO RM.NodepoolScheduler -
traverseNodepoolsForExpansion     N/A --- stop_here_dx 8
13 Nov 2014 22:04:14,926  INFO RM.NodePool - doExpansion     N/A NP:
--default-- Expansions in this order: 8:notfound
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
N/A  --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
 8  --default--       0       0      0     2
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
N/A  --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler - doEvictions
 7  --default--       1       1      0     2
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler - doEvictions
N/A --default-- NeededByOrder before any eviction: [0, 0, 0, 0]
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation     N/A vMachines:   0   1   0   0
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation     N/A Nodepools:--default--
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation     N/A     Nodepool       User PureFS  NSh
Counted Needed  O Class: normal
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation       8  --default--       ducc      0    0
0      0  2
13 Nov 2014 22:04:14,927  INFO RM.NodepoolScheduler -
detectFragmentation     N/A     Nodepool       User PureFS  NSh
Counted Needed  O Class: JobDriver
13 Nov 2014 22:04:14,928  INFO RM.NodepoolScheduler -
insureFullEviction     N/A No needy jobs, defragmentation bypassed.
13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule     N/A
--------------- Scheduler returns ---------------
13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule     N/A
 Expanded:
<none>

Shrunken:
   <none>

Stable:
   <none>

Dormant:
            ID                        JobName       User      Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
   J_________8                     Test_job_1       ducc     normal
  0     2       0   2      2     15       15     true         8

Reserved:
            ID                        JobName       User      Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
   R_________7                     Job_Driver     System  JobDriver
  1     2       2   0      2      0        0        0         1


13 Nov 2014 22:04:14,934  INFO RM.Scheduler - schedule     N/A
------------------------------------------------
13 Nov 2014 22:04:14,934  INFO RM.JobManagerConverter - createState
 N/A Schedule sent to Orchestrator
13 Nov 2014 22:04:14,934  INFO RM.JobManagerConverter - createState     N/A
Reservation 7
Existing[1]: .us-west-2.compute.internal.1^0
Additions[0]:
Removals[0]:
Job 8
Existing[0]:
Additions[0]:
Removals[0]:

13 Nov 2014 22:04:14,946  INFO RM.ResourceManagerComponent -
runScheduler     N/A -------- 30 ------- Scheduling loop returns
--------------------

2014-11-13 12:12 GMT-06:00 Eddie Epstein <ea...@gmail.com>:
> Simon,
>
> The DUCC resource manager logs into rm.log. Did you look there for reasons
> the resources are not being allocated?
>
> Eddie
>
> On Wed, Nov 12, 2014 at 4:07 PM, Simon Hafner <re...@gmail.com> wrote:
>
>> 4 shares total, 2 in use.
>>
>> 2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
>> > Try looking at your DUCC's web server.  On the System -> Machines page
>> > do you see any shares not inuse?
>> >
>> > Lou.
>> >
>> > On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com>
>> wrote:
>> >> I've set up DUCC according to
>> >> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>> >>
>> >>     ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>> >>
>> >> the job is stuck at WaitingForResources.
>> >>
>> >> 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
>> >> process     N/A ... Agent Collecting User Processes
>> >> 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
>> >> copyAllUserReservations     N/A +++++++++++ Copying User Reservations
>> >> - List Size:0
>> >> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>> >>    N/A ********** User Process Map Size After
>> >> copyAllUserReservations:0
>> >> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>> >>    N/A ********** User Process Map Size After
>> >> copyAllUserRougeProcesses:0
>> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>>    N/A
>> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>> >>    N/A
>> ******************************************************************************
>> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
>> >> process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
>> >> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
>> >> Low Swap Threshold Defined in ducc.properties:0
>> >> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> >> reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
>> >> ID:13
>> >> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> >> reportIncomingStateForThisNode     N/A
>> >> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
>> >> State:Allocated isDeallocated:false
>> >> 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
>> >> N/A +++++++++++ Copied User Reservations - List Size:0
>> >> 12 Nov 2014 10:37:33,405  INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>> >>  N/A PID:8168 Swap Usage:0
>> >> 12 Nov 2014 10:37:33,913  INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
>> >> collectProcessCurrentCPU     N/A 0.0 == CPUTIME:0.0
>> >> 12 Nov 2014 10:37:33,913  INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process     N/A
>> >> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
>> >> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>> >>
>> >> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
>> >> Linux (looks centos based).
>> >>
>> >> To install maven (not in the repos)
>> >>
>> >> #! /bin/bash
>> >>
>> >> TEMPORARY_DIRECTORY="$(mktemp -d)"
>> >> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>> >>
>> >> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>> >>
>> >> wget -O "$DOWNLOAD_TO"
>> >>
>> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>> >>
>> >> echo 'Extracting Maven'
>> >> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
>> >> rm $DOWNLOAD_TO
>> >>
>> >> echo 'Configuring Envrionment'
>> >>
>> >> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
>> >> echo -e 'export M2_HOME=/usr/local/maven\nexport
>> >> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
>> >> source /etc/profile.d/maven.sh
>> >>
>> >> echo 'The maven version: ' `mvn -version` ' has been installed.'
>> >> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
>> >> echo 'Removing the temporary directory...'
>> >> rm -r "$TEMPORARY_DIRECTORY"
>> >> echo 'Your Maven Installation is Complete.'
>>

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Eddie Epstein <ea...@gmail.com>.
Simon,

The DUCC resource manager logs into rm.log. Did you look there for reasons
the resources are not being allocated?

Eddie

On Wed, Nov 12, 2014 at 4:07 PM, Simon Hafner <re...@gmail.com> wrote:

> 4 shares total, 2 in use.
>
> 2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
> > Try looking at your DUCC's web server.  On the System -> Machines page
> > do you see any shares not inuse?
> >
> > Lou.
> >
> > On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com>
> wrote:
> >> I've set up DUCC according to
> >> https://cwiki.apache.org/confluence/display/UIMA/DUCC
> >>
> >>     ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
> >>
> >> the job is stuck at WaitingForResources.
> >>
> >> 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
> >> process     N/A ... Agent Collecting User Processes
> >> 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
> >> copyAllUserReservations     N/A +++++++++++ Copying User Reservations
> >> - List Size:0
> >> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
> >>    N/A ********** User Process Map Size After
> >> copyAllUserReservations:0
> >> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
> >>    N/A ********** User Process Map Size After
> >> copyAllUserRougeProcesses:0
> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>    N/A
> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
> >>    N/A
> ******************************************************************************
> >> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
> >> process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
> >> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
> >> Low Swap Threshold Defined in ducc.properties:0
> >> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
> >> reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
> >> ID:13
> >> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
> >> reportIncomingStateForThisNode     N/A
> >> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
> >> State:Allocated isDeallocated:false
> >> 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
> >> N/A +++++++++++ Copied User Reservations - List Size:0
> >> 12 Nov 2014 10:37:33,405  INFO
> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
> >>  N/A PID:8168 Swap Usage:0
> >> 12 Nov 2014 10:37:33,913  INFO
> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
> >> collectProcessCurrentCPU     N/A 0.0 == CPUTIME:0.0
> >> 12 Nov 2014 10:37:33,913  INFO
> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process     N/A
> >> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
> >> Usage Allowed:-108574720 Time to Collect Swap Usage:0
> >>
> >> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
> >> Linux (looks centos based).
> >>
> >> To install maven (not in the repos)
> >>
> >> #! /bin/bash
> >>
> >> TEMPORARY_DIRECTORY="$(mktemp -d)"
> >> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
> >>
> >> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
> >>
> >> wget -O "$DOWNLOAD_TO"
> >>
> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
> >>
> >> echo 'Extracting Maven'
> >> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
> >> rm $DOWNLOAD_TO
> >>
> >> echo 'Configuring Envrionment'
> >>
> >> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
> >> echo -e 'export M2_HOME=/usr/local/maven\nexport
> >> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
> >> source /etc/profile.d/maven.sh
> >>
> >> echo 'The maven version: ' `mvn -version` ' has been installed.'
> >> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
> >> echo 'Removing the temporary directory...'
> >> rm -r "$TEMPORARY_DIRECTORY"
> >> echo 'Your Maven Installation is Complete.'
>

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Simon Hafner <re...@gmail.com>.
4 shares total, 2 in use.

2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
> Try looking at your DUCC's web server.  On the System -> Machines page
> do you see any shares not inuse?
>
> Lou.
>
> On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com> wrote:
>> I've set up DUCC according to
>> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>>
>>     ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>>
>> the job is stuck at WaitingForResources.
>>
>> 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
>> process     N/A ... Agent Collecting User Processes
>> 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
>> copyAllUserReservations     N/A +++++++++++ Copying User Reservations
>> - List Size:0
>> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>>    N/A ********** User Process Map Size After
>> copyAllUserReservations:0
>> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>>    N/A ********** User Process Map Size After
>> copyAllUserRougeProcesses:0
>> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call     N/A
>> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>>    N/A ******************************************************************************
>> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
>> process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
>> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
>> Low Swap Threshold Defined in ducc.properties:0
>> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
>> ID:13
>> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
>> reportIncomingStateForThisNode     N/A
>> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
>> State:Allocated isDeallocated:false
>> 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
>> N/A +++++++++++ Copied User Reservations - List Size:0
>> 12 Nov 2014 10:37:33,405  INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>>  N/A PID:8168 Swap Usage:0
>> 12 Nov 2014 10:37:33,913  INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
>> collectProcessCurrentCPU     N/A 0.0 == CPUTIME:0.0
>> 12 Nov 2014 10:37:33,913  INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process     N/A
>> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
>> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>>
>> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
>> Linux (looks centos based).
>>
>> To install maven (not in the repos)
>>
>> #! /bin/bash
>>
>> TEMPORARY_DIRECTORY="$(mktemp -d)"
>> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>>
>> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>>
>> wget -O "$DOWNLOAD_TO"
>> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>>
>> echo 'Extracting Maven'
>> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
>> rm $DOWNLOAD_TO
>>
>> echo 'Configuring Envrionment'
>>
>> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
>> echo -e 'export M2_HOME=/usr/local/maven\nexport
>> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
>> source /etc/profile.d/maven.sh
>>
>> echo 'The maven version: ' `mvn -version` ' has been installed.'
>> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
>> echo 'Removing the temporary directory...'
>> rm -r "$TEMPORARY_DIRECTORY"
>> echo 'Your Maven Installation is Complete.'

Re: DUCC stuck at WaitingForResources on an Amazon Linux

Posted by Lou DeGenaro <lo...@gmail.com>.
Try looking at your DUCC's web server.  On the System -> Machines page
do you see any shares not inuse?

Lou.

On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com> wrote:
> I've set up DUCC according to
> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>
>     ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>
> the job is stuck at WaitingForResources.
>
> 12 Nov 2014 10:37:30,175  INFO Agent.LinuxNodeMetricsProcessor -
> process     N/A ... Agent Collecting User Processes
> 12 Nov 2014 10:37:30,176  INFO Agent.NodeAgent -
> copyAllUserReservations     N/A +++++++++++ Copying User Reservations
> - List Size:0
> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>    N/A ********** User Process Map Size After
> copyAllUserReservations:0
> 12 Nov 2014 10:37:30,176  INFO Agent.LinuxNodeMetricsProcessor - call
>    N/A ********** User Process Map Size After
> copyAllUserRougeProcesses:0
> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call     N/A
> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor - call
>    N/A ******************************************************************************
> 12 Nov 2014 10:37:30,182  INFO Agent.LinuxNodeMetricsProcessor -
> process     N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
> Low Swap Threshold Defined in ducc.properties:0
> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
> reportIncomingStateForThisNode     N/A Received OR Sequence:699 Thread
> ID:13
> 12 Nov 2014 10:37:33,303  INFO Agent.AgentEventListener -
> reportIncomingStateForThisNode     N/A
> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
> State:Allocated isDeallocated:false
> 12 Nov 2014 10:37:33,303  INFO Agent.NodeAgent - setReservations
> N/A +++++++++++ Copied User Reservations - List Size:0
> 12 Nov 2014 10:37:33,405  INFO
> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>  N/A PID:8168 Swap Usage:0
> 12 Nov 2014 10:37:33,913  INFO
> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
> collectProcessCurrentCPU     N/A 0.0 == CPUTIME:0.0
> 12 Nov 2014 10:37:33,913  INFO
> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process     N/A
> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>
> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
> Linux (looks centos based).
>
> To install maven (not in the repos)
>
> #! /bin/bash
>
> TEMPORARY_DIRECTORY="$(mktemp -d)"
> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>
> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>
> wget -O "$DOWNLOAD_TO"
> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>
> echo 'Extracting Maven'
> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
> rm $DOWNLOAD_TO
>
> echo 'Configuring Envrionment'
>
> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
> echo -e 'export M2_HOME=/usr/local/maven\nexport
> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
> source /etc/profile.d/maven.sh
>
> echo 'The maven version: ' `mvn -version` ' has been installed.'
> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
> echo 'Removing the temporary directory...'
> rm -r "$TEMPORARY_DIRECTORY"
> echo 'Your Maven Installation is Complete.'