You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Simon Hafner <re...@gmail.com> on 2014/11/12 11:51:26 UTC
DUCC stuck at WaitingForResources on an Amazon Linux
I've set up DUCC according to
https://cwiki.apache.org/confluence/display/UIMA/DUCC
ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
the job is stuck at WaitingForResources.
12 Nov 2014 10:37:30,175 INFO Agent.LinuxNodeMetricsProcessor -
process N/A ... Agent Collecting User Processes
12 Nov 2014 10:37:30,176 INFO Agent.NodeAgent -
copyAllUserReservations N/A +++++++++++ Copying User Reservations
- List Size:0
12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
N/A ********** User Process Map Size After
copyAllUserReservations:0
12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
N/A ********** User Process Map Size After
copyAllUserRougeProcesses:0
12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call N/A
12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
N/A ******************************************************************************
12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor -
process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
Low Swap Threshold Defined in ducc.properties:0
12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
ID:13
12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
reportIncomingStateForThisNode N/A
JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
State:Allocated isDeallocated:false
12 Nov 2014 10:37:33,303 INFO Agent.NodeAgent - setReservations
N/A +++++++++++ Copied User Reservations - List Size:0
12 Nov 2014 10:37:33,405 INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
N/A PID:8168 Swap Usage:0
12 Nov 2014 10:37:33,913 INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
12 Nov 2014 10:37:33,913 INFO
Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
Usage Allowed:-108574720 Time to Collect Swap Usage:0
I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
Linux (looks centos based).
To install maven (not in the repos)
#! /bin/bash
TEMPORARY_DIRECTORY="$(mktemp -d)"
DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
wget -O "$DOWNLOAD_TO"
http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
echo 'Extracting Maven'
tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
rm $DOWNLOAD_TO
echo 'Configuring Envrionment'
mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
echo -e 'export M2_HOME=/usr/local/maven\nexport
PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
source /etc/profile.d/maven.sh
echo 'The maven version: ' `mvn -version` ' has been installed.'
echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
echo 'Removing the temporary directory...'
rm -r "$TEMPORARY_DIRECTORY"
echo 'Your Maven Installation is Complete.'
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Eddie Epstein <ea...@gmail.com>.
On Fri, Nov 14, 2014 at 8:11 PM, Simon Hafner <re...@gmail.com> wrote:
> So to run effectively, I would need more memory, because the job wants
> two shares? ... Yes. With a larger node it works. What would be a
> reasonable memory size for a ducc node?
>
> Really depends on the application code. Quoting from the DUCC overview at
http://uima.apache.org/doc-uimaducc-whatitam.html
"DUCC is particularly well suited to run large memory Java analytics in
multiple threads in order to fully utilize multicore machines."
Our experience to date has been with machines 16-256GB and 4-32 CPU cores.
Smaller machines, 8GB or less, have only been used for development of DUCC
itself, with dummy analytics that use little memory and CPU.
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Simon Hafner <re...@gmail.com>.
So to run effectively, I would need more memory, because the job wants
two shares? ... Yes. With a larger node it works. What would be a
reasonable memory size for a ducc node?
2014-11-14 9:38 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
> Simon,
>
> Congratulations! You found a bug in DUCC's Web Server. It was incorrectly
> rounding up when reporting the number of shares for a machine. This issue
> is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>.
>
> Lou.
>
> On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <ch...@gmail.com> wrote:
>
>> Simon,
>> It looks like the problem is the amount of RAM on your machine. It's
>> going to be hard to get any meaningful work running on < about 8G.
>>
>> Here's what to do to get the test job to run on your 4G machine:
>> 1. In the resources folder, edit ducc.properties and change this:
>> ducc.jd.host.memory.size=2GB
>> to this:
>> ducc.jd.host.memory.size=1GB
>>
>> This is the amount of RAM that DUCC reserves for itself to manage
>> it's "head" processes.
>>
>> 2. In the examples/simple folder, edit 1.job and change this:
>> process_memory_size 2
>> to this:
>> process_memory_size 1
>>
>> This is the amount of memory in GB that the sample 1.job is
>> requesting.
>>
>> 3. Stop ducc and restart it so the ducc processes reset the
>> jd.host.memory size from the new ducc.properties.
>>
>> 4. Rerun 1.job and all should be well.
>>
>> Here are the gory details from the RM log, if you're interested. In
>> the RM log, I see these lines.
>>
>> 13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A
>> Name Order Active Shares Unused Shares Memory (MB)
>> Jobs
>> -------------------- ----- ------------- ------------- -----------
>> ------ ...
>> .us-west-2.compute.internal 3 2 1 3955 7 [1]
>>
>> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
>> the reservation/job "7", and that you have 1GB free. The reason you have
>> only 3GB **usable** is that usually the hardware/opsys will reserve a small
>> part of the installed RAM for itself, so the reported RAM is a tad
>> smaller. To avoid overcommitting the system, we use the reported value,
>> not the installed value. Most or all of the jobs here will easily
>> overwhelm even the largest machines if we don't do this.
>>
>> Next, these lines show the actual schedule the RM is trying to build.
>> Dormant:
>> ID JobName User Class Shares
>> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>> J_________8 Test_job_1 ducc normal 0
>> 2 0 2 2 15 15 true 8
>>
>> Reserved:
>> ID JobName User Class Shares
>> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
>> R_________7 Job_Driver System JobDriver 1
>> 2 2 0 2 0 0 0 1
>>
>> This confirms that the DUCC reservation "7" occupies 2G, and that job "8"
>> is requesting 2G but is "dormant", i.e. waiting for resources. Since there
>> is only 3G available on this machine, job 8 will wait.
>>
>> Best,
>> Jim
>>
>>
>>
>>
>>
>>
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Lou DeGenaro <lo...@gmail.com>.
Simon,
Congratulations! You found a bug in DUCC's Web Server. It was incorrectly
rounding up when reporting the number of shares for a machine. This issue
is addressed by Jira 4104 <https://issues.apache.org/jira/browse/UIMA-4104>.
Lou.
On Fri, Nov 14, 2014 at 7:49 AM, Jim Challenger <ch...@gmail.com> wrote:
> Simon,
> It looks like the problem is the amount of RAM on your machine. It's
> going to be hard to get any meaningful work running on < about 8G.
>
> Here's what to do to get the test job to run on your 4G machine:
> 1. In the resources folder, edit ducc.properties and change this:
> ducc.jd.host.memory.size=2GB
> to this:
> ducc.jd.host.memory.size=1GB
>
> This is the amount of RAM that DUCC reserves for itself to manage
> it's "head" processes.
>
> 2. In the examples/simple folder, edit 1.job and change this:
> process_memory_size 2
> to this:
> process_memory_size 1
>
> This is the amount of memory in GB that the sample 1.job is
> requesting.
>
> 3. Stop ducc and restart it so the ducc processes reset the
> jd.host.memory size from the new ducc.properties.
>
> 4. Rerun 1.job and all should be well.
>
> Here are the gory details from the RM log, if you're interested. In
> the RM log, I see these lines.
>
> 13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A
> Name Order Active Shares Unused Shares Memory (MB)
> Jobs
> -------------------- ----- ------------- ------------- -----------
> ------ ...
> .us-west-2.compute.internal 3 2 1 3955 7 [1]
>
> This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
> the reservation/job "7", and that you have 1GB free. The reason you have
> only 3GB **usable** is that usually the hardware/opsys will reserve a small
> part of the installed RAM for itself, so the reported RAM is a tad
> smaller. To avoid overcommitting the system, we use the reported value,
> not the installed value. Most or all of the jobs here will easily
> overwhelm even the largest machines if we don't do this.
>
> Next, these lines show the actual schedule the RM is trying to build.
> Dormant:
> ID JobName User Class Shares
> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
> J_________8 Test_job_1 ducc normal 0
> 2 0 2 2 15 15 true 8
>
> Reserved:
> ID JobName User Class Shares
> Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
> R_________7 Job_Driver System JobDriver 1
> 2 2 0 2 0 0 0 1
>
> This confirms that the DUCC reservation "7" occupies 2G, and that job "8"
> is requesting 2G but is "dormant", i.e. waiting for resources. Since there
> is only 3G available on this machine, job 8 will wait.
>
> Best,
> Jim
>
>
>
>
>
>
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Jim Challenger <ch...@gmail.com>.
Simon,
It looks like the problem is the amount of RAM on your machine.
It's going to be hard to get any meaningful work running on < about 8G.
Here's what to do to get the test job to run on your 4G machine:
1. In the resources folder, edit ducc.properties and change this:
ducc.jd.host.memory.size=2GB
to this:
ducc.jd.host.memory.size=1GB
This is the amount of RAM that DUCC reserves for itself to
manage it's "head" processes.
2. In the examples/simple folder, edit 1.job and change this:
process_memory_size 2
to this:
process_memory_size 1
This is the amount of memory in GB that the sample 1.job is
requesting.
3. Stop ducc and restart it so the ducc processes reset the
jd.host.memory size from the new ducc.properties.
4. Rerun 1.job and all should be well.
Here are the gory details from the RM log, if you're interested.
In the RM log, I see these lines.
13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A
Name Order Active Shares Unused Shares Memory
(MB) Jobs
-------------------- ----- ------------- -------------
----------- ------ ...
.us-west-2.compute.internal 3 2 1 3955 7 [1]
This says you have 3G of **usable-by-ducc** RAM, of which 2G are used by
the reservation/job "7", and that you have 1GB free. The reason you
have only 3GB **usable** is that usually the hardware/opsys will reserve
a small part of the installed RAM for itself, so the reported RAM is a
tad smaller. To avoid overcommitting the system, we use the reported
value, not the installed value. Most or all of the jobs here will
easily overwhelm even the largest machines if we don't do this.
Next, these lines show the actual schedule the RM is trying to build.
Dormant:
ID JobName User Class Shares
Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
J_________8 Test_job_1 ducc normal
0 2 0 2 2 15 15 true 8
Reserved:
ID JobName User Class Shares
Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
R_________7 Job_Driver System JobDriver
1 2 2 0 2 0 0 0 1
This confirms that the DUCC reservation "7" occupies 2G, and that job
"8" is requesting 2G but is "dormant", i.e. waiting for resources.
Since there is only 3G available on this machine, job 8 will wait.
Best,
Jim
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Simon Hafner <re...@gmail.com>.
I can't find anything on first glance. Maybe the memory?
13 Nov 2014 22:04:14,907 INFO RM.ResourceManagerComponent -
onJobManagerStateUpdate N/A -------> OR state arrives
13 Nov 2014 22:04:14,908 INFO RM.JobManagerConverter - jobUpdate
8 tot: 15 WaitingForResources -> WaitingForResources compl: 0 err: 0
rem: 15 mean: NaN
13 Nov 2014 22:04:14,908 INFO RM.ResourceManagerComponent -
runScheduler N/A -------- 30 ------- Entering scheduling loop
--------------------
13 Nov 2014 22:04:14,908 INFO RM.Scheduler - nodeArrives N/A
Total arrivals: 9
13 Nov 2014 22:04:14,908 INFO RM.Scheduler - schedule N/A
Scheduling 0 new jobs. Existing jobs: 2
13 Nov 2014 22:04:14,908 INFO RM.Scheduler - schedule N/A Run
scheduler 0 with top-level nodepool --default--
13 Nov 2014 22:04:14,908 INFO RM.RmJob - getPrjCap 8 ducc
Cannot predict cap: init_wait true || time_per_item NaN
13 Nov 2014 22:04:14,908 INFO RM.RmJob - initJobCap 8 ducc O 2
Base cap: 8 Expected future cap: 2147483647 potential cap 8 actual cap
1
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler - schedule N/A
Machine occupancy before schedule
13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A
================================== Query Machines Nodepool:
--default-- =========================
13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A
Name Order Active Shares Unused Shares Memory (MB) Jobs
-------------------- ----- ------------- ------------- ----------- ------ ...
.us-west-2.compute.internal 3 2 1 3955 7 [1]
13 Nov 2014 22:04:14,909 INFO RM.NodePool - queryMachines N/A
================================== End Query Machines Nodepool:
--default-- ======================
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler -
howMuchFixedShare N/A No jobs to schedule in class fixed
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler -
howMuchFixedShare N/A No jobs to schedule in class debug
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler -
howMuchFixedShare N/A Scheduling jobs in class: JobDriver 7
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler -
howMuchFixedShare 7 [stable] requested 1 assigned 1 processes,
2 QS
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class urgent
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler - howMuchFairShare
N/A No jobs to schedule in class urgent
13 Nov 2014 22:04:14,909 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class high
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A No jobs to schedule in class high
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class standalone
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A No jobs to schedule in class standalone
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class weekly
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A No jobs to schedule in class weekly
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class normal
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
8 Scheduling job in class normal : J_________8
Test_job_1 ducc normal 0 2 0 2 2
15 15 true 8
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class low
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A No jobs to schedule in class low
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A Schedule class background
13 Nov 2014 22:04:14,910 INFO RM.NodepoolScheduler - howMuchFairShare
N/A No jobs to schedule in class background
13 Nov 2014 22:04:14,924 DEBUG RM.NodepoolScheduler - countClassShares
N/A Counting for nodepool --default--
13 Nov 2014 22:04:14,924 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares RmCounter Start
13 Nov 2014 22:04:14,924 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares maxorder = 3
13 Nov 2014 22:04:14,924 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares entity_names = normal
13 Nov 2014 22:04:14,924 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares weights = 100
13 Nov 2014 22:04:14,924 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares wantedby.normal = 1 0
1 0
13 Nov 2014 22:04:14,924 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares vmachines = 0 1 0 0
13 Nov 2014 22:04:14,925 INFO RM.NodepoolScheduler -
apportion_qshares N/A countClassShares RmCounter End
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares Final apportionment:
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares normal gbo 0 0 0
0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares vshares 0 1 0 0
13 Nov 2014 22:04:14,925 DEBUG RM.NodepoolScheduler -
apportion_qshares N/A countClassShares nshares 0 1 0 0
13 Nov 2014 22:04:14,925 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class urgent
13 Nov 2014 22:04:14,925 INFO RM.NodepoolScheduler - whatOfFairShare
N/A No jobs to schedule in class urgent
13 Nov 2014 22:04:14,925 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class high
13 Nov 2014 22:04:14,925 INFO RM.NodepoolScheduler - whatOfFairShare
N/A No jobs to schedule in class high
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class standalone
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A No jobs to schedule in class standalone
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class weekly
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A No jobs to schedule in class weekly
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class normal
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
8 Scheduling job in class normal : 0 shares given, order 2
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class low
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A No jobs to schedule in class low
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A Schedule class background
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler - whatOfFairShare
N/A No jobs to schedule in class background
13 Nov 2014 22:04:14,926 INFO RM.NodepoolScheduler -
traverseNodepoolsForExpansion N/A --- stop_here_dx 8
13 Nov 2014 22:04:14,926 INFO RM.NodePool - doExpansion N/A NP:
--default-- Expansions in this order: 8:notfound
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler - doEvictions
N/A --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler - doEvictions
8 --default-- 0 0 0 2
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler - doEvictions
N/A --default-- Counted Current Needed Order
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler - doEvictions
7 --default-- 1 1 0 2
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler - doEvictions
N/A --default-- NeededByOrder before any eviction: [0, 0, 0, 0]
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation N/A vMachines: 0 1 0 0
13 Nov 2014 22:04:14,927 DEBUG RM.NodepoolScheduler -
detectFragmentation N/A Nodepools:--default--
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler -
detectFragmentation N/A Nodepool User PureFS NSh
Counted Needed O Class: normal
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler -
detectFragmentation 8 --default-- ducc 0 0
0 0 2
13 Nov 2014 22:04:14,927 INFO RM.NodepoolScheduler -
detectFragmentation N/A Nodepool User PureFS NSh
Counted Needed O Class: JobDriver
13 Nov 2014 22:04:14,928 INFO RM.NodepoolScheduler -
insureFullEviction N/A No needy jobs, defragmentation bypassed.
13 Nov 2014 22:04:14,934 INFO RM.Scheduler - schedule N/A
--------------- Scheduler returns ---------------
13 Nov 2014 22:04:14,934 INFO RM.Scheduler - schedule N/A
Expanded:
<none>
Shrunken:
<none>
Stable:
<none>
Dormant:
ID JobName User Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
J_________8 Test_job_1 ducc normal
0 2 0 2 2 15 15 true 8
Reserved:
ID JobName User Class
Shares Order QShares NTh Memory nQuest Ques Rem InitWait Max P/Nst
R_________7 Job_Driver System JobDriver
1 2 2 0 2 0 0 0 1
13 Nov 2014 22:04:14,934 INFO RM.Scheduler - schedule N/A
------------------------------------------------
13 Nov 2014 22:04:14,934 INFO RM.JobManagerConverter - createState
N/A Schedule sent to Orchestrator
13 Nov 2014 22:04:14,934 INFO RM.JobManagerConverter - createState N/A
Reservation 7
Existing[1]: .us-west-2.compute.internal.1^0
Additions[0]:
Removals[0]:
Job 8
Existing[0]:
Additions[0]:
Removals[0]:
13 Nov 2014 22:04:14,946 INFO RM.ResourceManagerComponent -
runScheduler N/A -------- 30 ------- Scheduling loop returns
--------------------
2014-11-13 12:12 GMT-06:00 Eddie Epstein <ea...@gmail.com>:
> Simon,
>
> The DUCC resource manager logs into rm.log. Did you look there for reasons
> the resources are not being allocated?
>
> Eddie
>
> On Wed, Nov 12, 2014 at 4:07 PM, Simon Hafner <re...@gmail.com> wrote:
>
>> 4 shares total, 2 in use.
>>
>> 2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
>> > Try looking at your DUCC's web server. On the System -> Machines page
>> > do you see any shares not inuse?
>> >
>> > Lou.
>> >
>> > On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com>
>> wrote:
>> >> I've set up DUCC according to
>> >> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>> >>
>> >> ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>> >>
>> >> the job is stuck at WaitingForResources.
>> >>
>> >> 12 Nov 2014 10:37:30,175 INFO Agent.LinuxNodeMetricsProcessor -
>> >> process N/A ... Agent Collecting User Processes
>> >> 12 Nov 2014 10:37:30,176 INFO Agent.NodeAgent -
>> >> copyAllUserReservations N/A +++++++++++ Copying User Reservations
>> >> - List Size:0
>> >> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
>> >> N/A ********** User Process Map Size After
>> >> copyAllUserReservations:0
>> >> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
>> >> N/A ********** User Process Map Size After
>> >> copyAllUserRougeProcesses:0
>> >> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
>> N/A
>> >> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
>> >> N/A
>> ******************************************************************************
>> >> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor -
>> >> process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
>> >> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
>> >> Low Swap Threshold Defined in ducc.properties:0
>> >> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
>> >> reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
>> >> ID:13
>> >> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
>> >> reportIncomingStateForThisNode N/A
>> >> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
>> >> State:Allocated isDeallocated:false
>> >> 12 Nov 2014 10:37:33,303 INFO Agent.NodeAgent - setReservations
>> >> N/A +++++++++++ Copied User Reservations - List Size:0
>> >> 12 Nov 2014 10:37:33,405 INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>> >> N/A PID:8168 Swap Usage:0
>> >> 12 Nov 2014 10:37:33,913 INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
>> >> collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
>> >> 12 Nov 2014 10:37:33,913 INFO
>> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
>> >> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
>> >> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>> >>
>> >> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
>> >> Linux (looks centos based).
>> >>
>> >> To install maven (not in the repos)
>> >>
>> >> #! /bin/bash
>> >>
>> >> TEMPORARY_DIRECTORY="$(mktemp -d)"
>> >> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>> >>
>> >> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>> >>
>> >> wget -O "$DOWNLOAD_TO"
>> >>
>> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>> >>
>> >> echo 'Extracting Maven'
>> >> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
>> >> rm $DOWNLOAD_TO
>> >>
>> >> echo 'Configuring Envrionment'
>> >>
>> >> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
>> >> echo -e 'export M2_HOME=/usr/local/maven\nexport
>> >> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
>> >> source /etc/profile.d/maven.sh
>> >>
>> >> echo 'The maven version: ' `mvn -version` ' has been installed.'
>> >> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
>> >> echo 'Removing the temporary directory...'
>> >> rm -r "$TEMPORARY_DIRECTORY"
>> >> echo 'Your Maven Installation is Complete.'
>>
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Eddie Epstein <ea...@gmail.com>.
Simon,
The DUCC resource manager logs into rm.log. Did you look there for reasons
the resources are not being allocated?
Eddie
On Wed, Nov 12, 2014 at 4:07 PM, Simon Hafner <re...@gmail.com> wrote:
> 4 shares total, 2 in use.
>
> 2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
> > Try looking at your DUCC's web server. On the System -> Machines page
> > do you see any shares not inuse?
> >
> > Lou.
> >
> > On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com>
> wrote:
> >> I've set up DUCC according to
> >> https://cwiki.apache.org/confluence/display/UIMA/DUCC
> >>
> >> ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
> >>
> >> the job is stuck at WaitingForResources.
> >>
> >> 12 Nov 2014 10:37:30,175 INFO Agent.LinuxNodeMetricsProcessor -
> >> process N/A ... Agent Collecting User Processes
> >> 12 Nov 2014 10:37:30,176 INFO Agent.NodeAgent -
> >> copyAllUserReservations N/A +++++++++++ Copying User Reservations
> >> - List Size:0
> >> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
> >> N/A ********** User Process Map Size After
> >> copyAllUserReservations:0
> >> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
> >> N/A ********** User Process Map Size After
> >> copyAllUserRougeProcesses:0
> >> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
> N/A
> >> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
> >> N/A
> ******************************************************************************
> >> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor -
> >> process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
> >> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
> >> Low Swap Threshold Defined in ducc.properties:0
> >> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
> >> reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
> >> ID:13
> >> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
> >> reportIncomingStateForThisNode N/A
> >> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
> >> State:Allocated isDeallocated:false
> >> 12 Nov 2014 10:37:33,303 INFO Agent.NodeAgent - setReservations
> >> N/A +++++++++++ Copied User Reservations - List Size:0
> >> 12 Nov 2014 10:37:33,405 INFO
> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
> >> N/A PID:8168 Swap Usage:0
> >> 12 Nov 2014 10:37:33,913 INFO
> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
> >> collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
> >> 12 Nov 2014 10:37:33,913 INFO
> >> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
> >> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
> >> Usage Allowed:-108574720 Time to Collect Swap Usage:0
> >>
> >> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
> >> Linux (looks centos based).
> >>
> >> To install maven (not in the repos)
> >>
> >> #! /bin/bash
> >>
> >> TEMPORARY_DIRECTORY="$(mktemp -d)"
> >> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
> >>
> >> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
> >>
> >> wget -O "$DOWNLOAD_TO"
> >>
> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
> >>
> >> echo 'Extracting Maven'
> >> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
> >> rm $DOWNLOAD_TO
> >>
> >> echo 'Configuring Envrionment'
> >>
> >> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
> >> echo -e 'export M2_HOME=/usr/local/maven\nexport
> >> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
> >> source /etc/profile.d/maven.sh
> >>
> >> echo 'The maven version: ' `mvn -version` ' has been installed.'
> >> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
> >> echo 'Removing the temporary directory...'
> >> rm -r "$TEMPORARY_DIRECTORY"
> >> echo 'Your Maven Installation is Complete.'
>
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Simon Hafner <re...@gmail.com>.
4 shares total, 2 in use.
2014-11-12 5:06 GMT-06:00 Lou DeGenaro <lo...@gmail.com>:
> Try looking at your DUCC's web server. On the System -> Machines page
> do you see any shares not inuse?
>
> Lou.
>
> On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com> wrote:
>> I've set up DUCC according to
>> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>>
>> ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>>
>> the job is stuck at WaitingForResources.
>>
>> 12 Nov 2014 10:37:30,175 INFO Agent.LinuxNodeMetricsProcessor -
>> process N/A ... Agent Collecting User Processes
>> 12 Nov 2014 10:37:30,176 INFO Agent.NodeAgent -
>> copyAllUserReservations N/A +++++++++++ Copying User Reservations
>> - List Size:0
>> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
>> N/A ********** User Process Map Size After
>> copyAllUserReservations:0
>> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
>> N/A ********** User Process Map Size After
>> copyAllUserRougeProcesses:0
>> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call N/A
>> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
>> N/A ******************************************************************************
>> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor -
>> process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
>> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
>> Low Swap Threshold Defined in ducc.properties:0
>> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
>> reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
>> ID:13
>> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
>> reportIncomingStateForThisNode N/A
>> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
>> State:Allocated isDeallocated:false
>> 12 Nov 2014 10:37:33,303 INFO Agent.NodeAgent - setReservations
>> N/A +++++++++++ Copied User Reservations - List Size:0
>> 12 Nov 2014 10:37:33,405 INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
>> N/A PID:8168 Swap Usage:0
>> 12 Nov 2014 10:37:33,913 INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
>> collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
>> 12 Nov 2014 10:37:33,913 INFO
>> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
>> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
>> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>>
>> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
>> Linux (looks centos based).
>>
>> To install maven (not in the repos)
>>
>> #! /bin/bash
>>
>> TEMPORARY_DIRECTORY="$(mktemp -d)"
>> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>>
>> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>>
>> wget -O "$DOWNLOAD_TO"
>> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>>
>> echo 'Extracting Maven'
>> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
>> rm $DOWNLOAD_TO
>>
>> echo 'Configuring Envrionment'
>>
>> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
>> echo -e 'export M2_HOME=/usr/local/maven\nexport
>> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
>> source /etc/profile.d/maven.sh
>>
>> echo 'The maven version: ' `mvn -version` ' has been installed.'
>> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
>> echo 'Removing the temporary directory...'
>> rm -r "$TEMPORARY_DIRECTORY"
>> echo 'Your Maven Installation is Complete.'
Re: DUCC stuck at WaitingForResources on an Amazon Linux
Posted by Lou DeGenaro <lo...@gmail.com>.
Try looking at your DUCC's web server. On the System -> Machines page
do you see any shares not inuse?
Lou.
On Wed, Nov 12, 2014 at 5:51 AM, Simon Hafner <re...@gmail.com> wrote:
> I've set up DUCC according to
> https://cwiki.apache.org/confluence/display/UIMA/DUCC
>
> ducc_install/bin/ducc_submit -f ducc_install/examples/simple/1.job
>
> the job is stuck at WaitingForResources.
>
> 12 Nov 2014 10:37:30,175 INFO Agent.LinuxNodeMetricsProcessor -
> process N/A ... Agent Collecting User Processes
> 12 Nov 2014 10:37:30,176 INFO Agent.NodeAgent -
> copyAllUserReservations N/A +++++++++++ Copying User Reservations
> - List Size:0
> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
> N/A ********** User Process Map Size After
> copyAllUserReservations:0
> 12 Nov 2014 10:37:30,176 INFO Agent.LinuxNodeMetricsProcessor - call
> N/A ********** User Process Map Size After
> copyAllUserRougeProcesses:0
> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call N/A
> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor - call
> N/A ******************************************************************************
> 12 Nov 2014 10:37:30,182 INFO Agent.LinuxNodeMetricsProcessor -
> process N/A ... Agent ip-172-31-7-237.us-west-2.compute.internal
> Posting Memory:4050676 Memory Free:4013752 Swap Total:0 Swap Free:0
> Low Swap Threshold Defined in ducc.properties:0
> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
> reportIncomingStateForThisNode N/A Received OR Sequence:699 Thread
> ID:13
> 12 Nov 2014 10:37:33,303 INFO Agent.AgentEventListener -
> reportIncomingStateForThisNode N/A
> JD--> JobId:6 ProcessId:0 PID:8168 Status:Running Resource
> State:Allocated isDeallocated:false
> 12 Nov 2014 10:37:33,303 INFO Agent.NodeAgent - setReservations
> N/A +++++++++++ Copied User Reservations - List Size:0
> 12 Nov 2014 10:37:33,405 INFO
> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - getSwapUsage-
> N/A PID:8168 Swap Usage:0
> 12 Nov 2014 10:37:33,913 INFO
> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b -
> collectProcessCurrentCPU N/A 0.0 == CPUTIME:0.0
> 12 Nov 2014 10:37:33,913 INFO
> Agent.AgentConfiguration$$EnhancerByCGLIB$$cc49880b - process N/A
> ----------- PID:8168 Major Faults:0 Process Swap Usage:0 Max Swap
> Usage Allowed:-108574720 Time to Collect Swap Usage:0
>
> I'm using a t2.medium instance (2 CPU, ~ 4GB RAM) and the stock Amazon
> Linux (looks centos based).
>
> To install maven (not in the repos)
>
> #! /bin/bash
>
> TEMPORARY_DIRECTORY="$(mktemp -d)"
> DOWNLOAD_TO="$TEMPORARY_DIRECTORY/maven.tgz"
>
> echo 'Downloading Maven to: ' "$DOWNLOAD_TO"
>
> wget -O "$DOWNLOAD_TO"
> http://www.eng.lsu.edu/mirrors/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
>
> echo 'Extracting Maven'
> tar xzf $DOWNLOAD_TO -C $TEMPORARY_DIRECTORY
> rm $DOWNLOAD_TO
>
> echo 'Configuring Envrionment'
>
> mv $TEMPORARY_DIRECTORY/apache-maven-* /usr/local/maven
> echo -e 'export M2_HOME=/usr/local/maven\nexport
> PATH=${M2_HOME}/bin:${PATH}' > /etc/profile.d/maven.sh
> source /etc/profile.d/maven.sh
>
> echo 'The maven version: ' `mvn -version` ' has been installed.'
> echo -e '\n\n!! Note you must relogin to get mvn in your path !!'
> echo 'Removing the temporary directory...'
> rm -r "$TEMPORARY_DIRECTORY"
> echo 'Your Maven Installation is Complete.'