You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Selina Chu <se...@gmail.com> on 2016/03/09 20:41:17 UTC

Random processing time with DUCC

Hi

I’m kind of new to DUCC and this forum.  I was hoping to see if someone
could give me some insights as to why DUCC is behaving strangely and a bit
unstable.

So what I'm trying to do is: I’m using DUCC to process a cTAKES job.
Currently DUCC is just using a single node.  DUCC seems to act randomly in
processing the jobs, varying between 4.5 minutes to 23 minutes, and I
wasn’t running anything else that is CPU intensive. When I don’t use DUCC
and use cTAKES alone, the times for processing are pretty consistent.

To demonstrate this strange behavior in DUCC, I submitted the exact same
job 10 times in a row (job ID 95-104), without modification to the settings.
The duration for finishing each of the jobs are: 4:41, 4:43, 12:48, 8:41,
5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The first 9 jobs
were completed and the last one got canceled.  Even before the last job,
the first 9 jobs were varying in duration times.
After restarting DUCC a couple of times and resetting it, I submitted the
same job (job ID 110), that job was completed without a problem (long
processing time)

I noticed that when a job takes a long time to finish, past 5 minutes, it
seemed to be stuck at the “initializing” and “completing” states for the
longest.

It seems like DUCC is doing something randomly.  I tried examining the log
files, but they are all similar, except for the time between each state.
(I’ve also placed the related logs and job file in a repo
https://github.com/selinachu/Templogs, in case anyone is interested in
examining them.)

I’m baffled with the random behaviors from DUCC. I was hoping maybe someone
could clarify this more for me.

After completing a job, what does DUCC do? Does it save something in
memory, which carries over to the next job, which probably relates to the
initialization process?  Are there some parameter settings that might
alleviate this type of behavior?

I would appreciate any insight.  Thanks in advance for your help.


Cheers,
Selina Chu

Re: Random processing time with DUCC

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Selina,

After getting my UMLS license the only changes needed to get ctakes.job
running were:
1. remove the line "driver_jvm_args    -Xmx4g"

2. add a typesystem import to the CR descriptor,
FilesInDirectoryCollectionReader.xml
   <typeSystemDescription>
     <imports>
       <import name="org.apache.ctakes.typesystem.types.TypeSystem" />
     </imports>
   </typeSystemDescription>

3. add my UMLS credentials to the processing pipeline JVM args:
      process_jvm_args      -Xmx5g -Dctakes.umlsuser={my-user-name}
-Dctakes.umlspw={my-password}

It took 95 seconds to process the one document from your logfile.
The output file test.xml had ~7500 lines of text.
Memory use was JD-RSS=0.1 GB and JP-RSS=1.6 GB.

The processing breakdown for this job, for the components using >1% of
total time looked like this:

Name                            Total   %Total  Avg     Min     Max
Summary                         01:35.7 100.0   01:35.7 01:35.7 01:35.7
UmlsDictionaryLookupAnnotator   47.0    49.1    47.0    47.0    47.0
ConstituencyParserAnnotator     24.0    25.1    24.0    24.0    24.0
LVG Annotator                   11.3    11.9    11.3    11.3    11.3
PolarityCleartkAnalysisEngine   02.4    2.6     02.4    02.4    02.4
GenericCleartkAnalysisEngine    01.7    1.8     01.7    01.7    01.7
HistoryCleartkAnalysisEngine    01.6    1.8     01.6    01.6    01.6
ClearNLPDependencyParserAE      01.5    1.6     01.5    01.5    01.5
Chunker                         01.1    1.2     01.1    01.1    01.1
Write CAS to XML file           01.1    1.2     01.1    01.1    01.1
SubjectCleartkAnalysisEngine    01.0    1.1     01.0    01.0    01.0

Good to know you were running on Mac OS X. Although some of the components
were developed on a Mac, much of DUCC itself has not been tested there,
including details like memory and CPU utilization.

No problem spending a little time on this.  cTAKES is an excellent
application to take advantage of DUCC.

Regards,
Eddie


On Mon, Mar 14, 2016 at 12:38 PM, Selina Chu <se...@gmail.com> wrote:

> Hi Eddie,
>
> Thanks again for your reply.
>
> That is strange that the node memory size showed that much.
> I only have 16GB of RAM, and I’m running it from a macbook pro/OS X.  When
> I run DUCC, I tried not to run any unnecessary processes and application. I
> don’t understand how it could be showing that much.
>
> Given the weird node memory size, I wonder if it’s a problem with my
> computer or just an initialization setting. I did use disk utility to check
> my file system, but it isn’t showing any irregularities.
>
> Looking back at the job details page of the ducc-mon, the RSS for the job I
> posted last time shows that the RSS was 0 for both.
>
> Thanks for taking the time to replicate the job I’m running.  I don’t want
> to take up too much of your time, but thank you so much for doing so and
> much appreciate it.  cTAKES can be a bit tricky to configure. I’ve spend
> some time on it. Let me know if come across problems.
>
> Best,
> Selina
>
>
> On Fri, Mar 11, 2016 at 1:41 PM, Eddie Epstein <ea...@gmail.com>
> wrote:
>
> > Selina,
> >
> > Thanks for the log files. The agent log shows "Node Memory Total:251392
> MB"
> > which looks like a healthy size. Is this value correct? Just in case,
> what
> > OS are you running on?
> >
> > Unfortunately the RSS size for the JD and JP processes is not being shown
> > in the agent log in that version of DUCC. They should be shown on the job
> > details page of ducc-mon.
> >
> > The agent log does confirm that cgroups are not enabled, which should
> > eliminate the possibility that the JD was swapping. That leaves me
> puzzled
> > about JD behavior.
> >
> > The need for a JD sending references to input data rather than the data
> > itself is to try to avoid making the JD a bottleneck when processing is
> > scaled out. Not yet clear that the current CR is the cause of erratic
> > behavior.
> >
> > I attempted to replicate your job here but got stuck on UMLS
> authentication
> > and am now waiting for the approval to use UMLS.
> >
> > DUCC's rogue detection is intended for machines that are fully managed by
> > DUCC. The default properties includes a small number of processes, like
> ssh
> > and bash, which are useful to ignore. All UIDs below a specified
> threshold
> > are also ignored. Certainly OK to customize to ignore specific users or
> > process names, but remember that DUCC will attempt to utilize all memory
> > that was not taken by system users when the agent started. Unexpected
> > memory use by other processes can lead to over-committing system memory.
> >
> > Below are the lines to modify as desired, add to the file
> > site.ducc.properties, and then restart DUCC. The site file facilitates
> > migration to new DUCC versions.
> >
> > # max UID reserved by OS. This is used to detect rogue processes and to
> > report
> > # available memory on a node.
> > ducc.agent.node.metrics.sys.gid.max=500
> > # exclude the following user ids while detecting rogue processes
> > ducc.agent.rogue.process.user.exclusion.filter=
> > #exclude the following processes while detecting rogue processes
> >
> >
> ducc.agent.rogue.process.exclusion.filter=sshd:,-bash,-sh,/bin/sh,/bin/bash,grep,ps
> >
> >
> > Regards,
> > Eddie
> >
> >
> > On Fri, Mar 11, 2016 at 1:38 PM, Selina Chu <se...@gmail.com>
> wrote:
> >
> > > Hi Eddie,
> > >
> > > Thanks for the pointer about not putting the analytic pipeline in the
> JD
> > > driver.  It seems like we’ve misunderstood the use of it. We’ll look
> into
> > > modifying it so that the JD driver contains only the collection reader
> > > component. Hopefully cTAKES will let us do so.
> > >
> > > As suggested, I restarted DUCC and ran the same job once.  The
> agent.log
> > > file is quite big.  So I’ve placed it in a repo, along with others
> > related
> > > logs in here:
> > > https://github.com/selinachu/Templogs/tree/master/NewLogs_Mar11
> > >
> > > I noticed that the agent log indicated many rogue processes.  Would it
> be
> > > helpful to modify the settings in ducc.properties to clean up these
> > > processes?
> > >
> > > Thanks again for your help.
> > >
> > > Cheers,
> > > Selina
> > >
> > >
> > > On Thu, Mar 10, 2016 at 10:30 AM, Eddie Epstein <ea...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > DUCC has some logfiles that show more details of the machine and the
> > job
> > > > which would allow us to answer your questions about machine physical
> > > > resources. These are located in $DUCC_HOME/logs, and in particular
> the
> > > > agent log would be very helpful. The logfile name is {machine
> > > > name}.{domain}.agent.log
> > > > Please restart ducc so we can see the log from agent startup thru
> > running
> > > > the job one time.
> > > >
> > > > As for the JD memory requirement, the JD driver should not contain
> any
> > of
> > > > the analytic pipeline. Its purpose is normally to send a reference to
> > the
> > > > input data to the Job Processes which will read the input data,
> process
> > > it
> > > > and write results. (This is described at
> > > > http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1
> )
> > > >
> > > > It should be possible for you to take *just* the collection reader
> > > > component from the cTAKES pipeline and use that for the JobDriver.
> > > > Hopefully this would need much less than Xmx400,
> > > >
> > > > Regards,
> > > > Eddie
> > > >
> > > >
> > > > On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <se...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Eddie,
> > > > >
> > > > > Thanks so much for taking the time to look at my issue and for your
> > > > reply.
> > > > >
> > > > > The reason I had to increase the heap size for the JD is because
> I'm
> > > > > running cTAKES (http://ctakes.apache.org/) with DUCC.  The
> increased
> > > > heap
> > > > > size is to accommodate loading all the models from cTAKES into
> > memory.
> > > > > Before, when I didn't increase the memory size, DUCC would cancel
> the
> > > > > driver and ends.  cTAKES would return back the error of
> > > > > "java.lang.OutOfMemoryError: Java heap space”.
> > > > >
> > > > > Would you say that this problem is mainly a limitation of my
> physical
> > > > > memory and processes that are running on my computer or can it be
> > > > adjusted
> > > > > in DUCC, like making parameter adjustments so I can use an
> increased
> > > heap
> > > > > size or maybe a way to pre-allocate enough memory to be used by
> DUCC?
> > > > >
> > > > > Thanks again,
> > > > > Selina
> > > > >
> > > > >
> > > > > On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <eaepstein@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi Selina,
> > > > > >
> > > > > > I suspect that the problem is due to the following job parameter:
> > > > > >       driver_jvm_args                -Xmx4g
> > > > > >
> > > > > > This would certainly be true if cgroups have been configured on
> for
> > > > DUCC.
> > > > > > The default cgroup size for a JD is 450MB, so specifying an Xmx
> of
> > > 4GB
> > > > > can
> > > > > > cause the JVM to spill into swap space and cause erratic
> behavior.
> > > > > >
> > > > > > Comparing a "fast" job (96) vs "slow" job (97), the time to
> process
> > > the
> > > > > > single work item was 8 sec vs 9 sec:
> > > > > >    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20]
> summarize
> > > > > > workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14
> stddev=.00
> > > > > > vs
> > > > > >    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19]
> summarize
> > > > > > workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41
> stddev=.00
> > > > > >
> > > > > > The extra delays between the two jobs appear associated with the
> > Job
> > > > > > Driver.
> > > > > >
> > > > > > Was there some reason you specified heap size for the JD? The
> > default
> > > > JD
> > > > > > heap size is Xmx400m.
> > > > > >
> > > > > > Regards,
> > > > > > Eddie
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <selina.chu@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > I’m kind of new to DUCC and this forum.  I was hoping to see if
> > > > someone
> > > > > > > could give me some insights as to why DUCC is behaving
> strangely
> > > and
> > > > a
> > > > > > bit
> > > > > > > unstable.
> > > > > > >
> > > > > > > So what I'm trying to do is: I’m using DUCC to process a cTAKES
> > > job.
> > > > > > > Currently DUCC is just using a single node.  DUCC seems to act
> > > > randomly
> > > > > > in
> > > > > > > processing the jobs, varying between 4.5 minutes to 23 minutes,
> > > and I
> > > > > > > wasn’t running anything else that is CPU intensive. When I
> don’t
> > > use
> > > > > DUCC
> > > > > > > and use cTAKES alone, the times for processing are pretty
> > > consistent.
> > > > > > >
> > > > > > > To demonstrate this strange behavior in DUCC, I submitted the
> > exact
> > > > > same
> > > > > > > job 10 times in a row (job ID 95-104), without modification to
> > the
> > > > > > > settings.
> > > > > > > The duration for finishing each of the jobs are: 4:41, 4:43,
> > 12:48,
> > > > > 8:41,
> > > > > > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The
> > > first
> > > > 9
> > > > > > jobs
> > > > > > > were completed and the last one got canceled.  Even before the
> > last
> > > > > job,
> > > > > > > the first 9 jobs were varying in duration times.
> > > > > > > After restarting DUCC a couple of times and resetting it, I
> > > submitted
> > > > > the
> > > > > > > same job (job ID 110), that job was completed without a problem
> > > (long
> > > > > > > processing time)
> > > > > > >
> > > > > > > I noticed that when a job takes a long time to finish, past 5
> > > > minutes,
> > > > > it
> > > > > > > seemed to be stuck at the “initializing” and “completing”
> states
> > > for
> > > > > the
> > > > > > > longest.
> > > > > > >
> > > > > > > It seems like DUCC is doing something randomly.  I tried
> > examining
> > > > the
> > > > > > log
> > > > > > > files, but they are all similar, except for the time between
> each
> > > > > state.
> > > > > > > (I’ve also placed the related logs and job file in a repo
> > > > > > > https://github.com/selinachu/Templogs, in case anyone is
> > > interested
> > > > in
> > > > > > > examining them.)
> > > > > > >
> > > > > > > I’m baffled with the random behaviors from DUCC. I was hoping
> > maybe
> > > > > > someone
> > > > > > > could clarify this more for me.
> > > > > > >
> > > > > > > After completing a job, what does DUCC do? Does it save
> something
> > > in
> > > > > > > memory, which carries over to the next job, which probably
> > relates
> > > to
> > > > > the
> > > > > > > initialization process?  Are there some parameter settings that
> > > might
> > > > > > > alleviate this type of behavior?
> > > > > > >
> > > > > > > I would appreciate any insight.  Thanks in advance for your
> help.
> > > > > > >
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Selina Chu
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Random processing time with DUCC

Posted by Selina Chu <se...@gmail.com>.
Hi Eddie,

Thanks again for your reply.

That is strange that the node memory size showed that much.
I only have 16GB of RAM, and I’m running it from a macbook pro/OS X.  When
I run DUCC, I tried not to run any unnecessary processes and application. I
don’t understand how it could be showing that much.

Given the weird node memory size, I wonder if it’s a problem with my
computer or just an initialization setting. I did use disk utility to check
my file system, but it isn’t showing any irregularities.

Looking back at the job details page of the ducc-mon, the RSS for the job I
posted last time shows that the RSS was 0 for both.

Thanks for taking the time to replicate the job I’m running.  I don’t want
to take up too much of your time, but thank you so much for doing so and
much appreciate it.  cTAKES can be a bit tricky to configure. I’ve spend
some time on it. Let me know if come across problems.

Best,
Selina


On Fri, Mar 11, 2016 at 1:41 PM, Eddie Epstein <ea...@gmail.com> wrote:

> Selina,
>
> Thanks for the log files. The agent log shows "Node Memory Total:251392 MB"
> which looks like a healthy size. Is this value correct? Just in case, what
> OS are you running on?
>
> Unfortunately the RSS size for the JD and JP processes is not being shown
> in the agent log in that version of DUCC. They should be shown on the job
> details page of ducc-mon.
>
> The agent log does confirm that cgroups are not enabled, which should
> eliminate the possibility that the JD was swapping. That leaves me puzzled
> about JD behavior.
>
> The need for a JD sending references to input data rather than the data
> itself is to try to avoid making the JD a bottleneck when processing is
> scaled out. Not yet clear that the current CR is the cause of erratic
> behavior.
>
> I attempted to replicate your job here but got stuck on UMLS authentication
> and am now waiting for the approval to use UMLS.
>
> DUCC's rogue detection is intended for machines that are fully managed by
> DUCC. The default properties includes a small number of processes, like ssh
> and bash, which are useful to ignore. All UIDs below a specified threshold
> are also ignored. Certainly OK to customize to ignore specific users or
> process names, but remember that DUCC will attempt to utilize all memory
> that was not taken by system users when the agent started. Unexpected
> memory use by other processes can lead to over-committing system memory.
>
> Below are the lines to modify as desired, add to the file
> site.ducc.properties, and then restart DUCC. The site file facilitates
> migration to new DUCC versions.
>
> # max UID reserved by OS. This is used to detect rogue processes and to
> report
> # available memory on a node.
> ducc.agent.node.metrics.sys.gid.max=500
> # exclude the following user ids while detecting rogue processes
> ducc.agent.rogue.process.user.exclusion.filter=
> #exclude the following processes while detecting rogue processes
>
> ducc.agent.rogue.process.exclusion.filter=sshd:,-bash,-sh,/bin/sh,/bin/bash,grep,ps
>
>
> Regards,
> Eddie
>
>
> On Fri, Mar 11, 2016 at 1:38 PM, Selina Chu <se...@gmail.com> wrote:
>
> > Hi Eddie,
> >
> > Thanks for the pointer about not putting the analytic pipeline in the JD
> > driver.  It seems like we’ve misunderstood the use of it. We’ll look into
> > modifying it so that the JD driver contains only the collection reader
> > component. Hopefully cTAKES will let us do so.
> >
> > As suggested, I restarted DUCC and ran the same job once.  The agent.log
> > file is quite big.  So I’ve placed it in a repo, along with others
> related
> > logs in here:
> > https://github.com/selinachu/Templogs/tree/master/NewLogs_Mar11
> >
> > I noticed that the agent log indicated many rogue processes.  Would it be
> > helpful to modify the settings in ducc.properties to clean up these
> > processes?
> >
> > Thanks again for your help.
> >
> > Cheers,
> > Selina
> >
> >
> > On Thu, Mar 10, 2016 at 10:30 AM, Eddie Epstein <ea...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > DUCC has some logfiles that show more details of the machine and the
> job
> > > which would allow us to answer your questions about machine physical
> > > resources. These are located in $DUCC_HOME/logs, and in particular the
> > > agent log would be very helpful. The logfile name is {machine
> > > name}.{domain}.agent.log
> > > Please restart ducc so we can see the log from agent startup thru
> running
> > > the job one time.
> > >
> > > As for the JD memory requirement, the JD driver should not contain any
> of
> > > the analytic pipeline. Its purpose is normally to send a reference to
> the
> > > input data to the Job Processes which will read the input data, process
> > it
> > > and write results. (This is described at
> > > http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1 )
> > >
> > > It should be possible for you to take *just* the collection reader
> > > component from the cTAKES pipeline and use that for the JobDriver.
> > > Hopefully this would need much less than Xmx400,
> > >
> > > Regards,
> > > Eddie
> > >
> > >
> > > On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <se...@gmail.com>
> > wrote:
> > >
> > > > Hi Eddie,
> > > >
> > > > Thanks so much for taking the time to look at my issue and for your
> > > reply.
> > > >
> > > > The reason I had to increase the heap size for the JD is because I'm
> > > > running cTAKES (http://ctakes.apache.org/) with DUCC.  The increased
> > > heap
> > > > size is to accommodate loading all the models from cTAKES into
> memory.
> > > > Before, when I didn't increase the memory size, DUCC would cancel the
> > > > driver and ends.  cTAKES would return back the error of
> > > > "java.lang.OutOfMemoryError: Java heap space”.
> > > >
> > > > Would you say that this problem is mainly a limitation of my physical
> > > > memory and processes that are running on my computer or can it be
> > > adjusted
> > > > in DUCC, like making parameter adjustments so I can use an increased
> > heap
> > > > size or maybe a way to pre-allocate enough memory to be used by DUCC?
> > > >
> > > > Thanks again,
> > > > Selina
> > > >
> > > >
> > > > On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <ea...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Selina,
> > > > >
> > > > > I suspect that the problem is due to the following job parameter:
> > > > >       driver_jvm_args                -Xmx4g
> > > > >
> > > > > This would certainly be true if cgroups have been configured on for
> > > DUCC.
> > > > > The default cgroup size for a JD is 450MB, so specifying an Xmx of
> > 4GB
> > > > can
> > > > > cause the JVM to spill into swap space and cause erratic behavior.
> > > > >
> > > > > Comparing a "fast" job (96) vs "slow" job (97), the time to process
> > the
> > > > > single work item was 8 sec vs 9 sec:
> > > > >    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
> > > > > workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
> > > > > vs
> > > > >    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
> > > > > workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00
> > > > >
> > > > > The extra delays between the two jobs appear associated with the
> Job
> > > > > Driver.
> > > > >
> > > > > Was there some reason you specified heap size for the JD? The
> default
> > > JD
> > > > > heap size is Xmx400m.
> > > > >
> > > > > Regards,
> > > > > Eddie
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <se...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > I’m kind of new to DUCC and this forum.  I was hoping to see if
> > > someone
> > > > > > could give me some insights as to why DUCC is behaving strangely
> > and
> > > a
> > > > > bit
> > > > > > unstable.
> > > > > >
> > > > > > So what I'm trying to do is: I’m using DUCC to process a cTAKES
> > job.
> > > > > > Currently DUCC is just using a single node.  DUCC seems to act
> > > randomly
> > > > > in
> > > > > > processing the jobs, varying between 4.5 minutes to 23 minutes,
> > and I
> > > > > > wasn’t running anything else that is CPU intensive. When I don’t
> > use
> > > > DUCC
> > > > > > and use cTAKES alone, the times for processing are pretty
> > consistent.
> > > > > >
> > > > > > To demonstrate this strange behavior in DUCC, I submitted the
> exact
> > > > same
> > > > > > job 10 times in a row (job ID 95-104), without modification to
> the
> > > > > > settings.
> > > > > > The duration for finishing each of the jobs are: 4:41, 4:43,
> 12:48,
> > > > 8:41,
> > > > > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The
> > first
> > > 9
> > > > > jobs
> > > > > > were completed and the last one got canceled.  Even before the
> last
> > > > job,
> > > > > > the first 9 jobs were varying in duration times.
> > > > > > After restarting DUCC a couple of times and resetting it, I
> > submitted
> > > > the
> > > > > > same job (job ID 110), that job was completed without a problem
> > (long
> > > > > > processing time)
> > > > > >
> > > > > > I noticed that when a job takes a long time to finish, past 5
> > > minutes,
> > > > it
> > > > > > seemed to be stuck at the “initializing” and “completing” states
> > for
> > > > the
> > > > > > longest.
> > > > > >
> > > > > > It seems like DUCC is doing something randomly.  I tried
> examining
> > > the
> > > > > log
> > > > > > files, but they are all similar, except for the time between each
> > > > state.
> > > > > > (I’ve also placed the related logs and job file in a repo
> > > > > > https://github.com/selinachu/Templogs, in case anyone is
> > interested
> > > in
> > > > > > examining them.)
> > > > > >
> > > > > > I’m baffled with the random behaviors from DUCC. I was hoping
> maybe
> > > > > someone
> > > > > > could clarify this more for me.
> > > > > >
> > > > > > After completing a job, what does DUCC do? Does it save something
> > in
> > > > > > memory, which carries over to the next job, which probably
> relates
> > to
> > > > the
> > > > > > initialization process?  Are there some parameter settings that
> > might
> > > > > > alleviate this type of behavior?
> > > > > >
> > > > > > I would appreciate any insight.  Thanks in advance for your help.
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Selina Chu
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Random processing time with DUCC

Posted by Eddie Epstein <ea...@gmail.com>.
Selina,

Thanks for the log files. The agent log shows "Node Memory Total:251392 MB"
which looks like a healthy size. Is this value correct? Just in case, what
OS are you running on?

Unfortunately the RSS size for the JD and JP processes is not being shown
in the agent log in that version of DUCC. They should be shown on the job
details page of ducc-mon.

The agent log does confirm that cgroups are not enabled, which should
eliminate the possibility that the JD was swapping. That leaves me puzzled
about JD behavior.

The need for a JD sending references to input data rather than the data
itself is to try to avoid making the JD a bottleneck when processing is
scaled out. Not yet clear that the current CR is the cause of erratic
behavior.

I attempted to replicate your job here but got stuck on UMLS authentication
and am now waiting for the approval to use UMLS.

DUCC's rogue detection is intended for machines that are fully managed by
DUCC. The default properties includes a small number of processes, like ssh
and bash, which are useful to ignore. All UIDs below a specified threshold
are also ignored. Certainly OK to customize to ignore specific users or
process names, but remember that DUCC will attempt to utilize all memory
that was not taken by system users when the agent started. Unexpected
memory use by other processes can lead to over-committing system memory.

Below are the lines to modify as desired, add to the file
site.ducc.properties, and then restart DUCC. The site file facilitates
migration to new DUCC versions.

# max UID reserved by OS. This is used to detect rogue processes and to
report
# available memory on a node.
ducc.agent.node.metrics.sys.gid.max=500
# exclude the following user ids while detecting rogue processes
ducc.agent.rogue.process.user.exclusion.filter=
#exclude the following processes while detecting rogue processes
ducc.agent.rogue.process.exclusion.filter=sshd:,-bash,-sh,/bin/sh,/bin/bash,grep,ps


Regards,
Eddie


On Fri, Mar 11, 2016 at 1:38 PM, Selina Chu <se...@gmail.com> wrote:

> Hi Eddie,
>
> Thanks for the pointer about not putting the analytic pipeline in the JD
> driver.  It seems like we’ve misunderstood the use of it. We’ll look into
> modifying it so that the JD driver contains only the collection reader
> component. Hopefully cTAKES will let us do so.
>
> As suggested, I restarted DUCC and ran the same job once.  The agent.log
> file is quite big.  So I’ve placed it in a repo, along with others related
> logs in here:
> https://github.com/selinachu/Templogs/tree/master/NewLogs_Mar11
>
> I noticed that the agent log indicated many rogue processes.  Would it be
> helpful to modify the settings in ducc.properties to clean up these
> processes?
>
> Thanks again for your help.
>
> Cheers,
> Selina
>
>
> On Thu, Mar 10, 2016 at 10:30 AM, Eddie Epstein <ea...@gmail.com>
> wrote:
>
> > Hi,
> >
> > DUCC has some logfiles that show more details of the machine and the job
> > which would allow us to answer your questions about machine physical
> > resources. These are located in $DUCC_HOME/logs, and in particular the
> > agent log would be very helpful. The logfile name is {machine
> > name}.{domain}.agent.log
> > Please restart ducc so we can see the log from agent startup thru running
> > the job one time.
> >
> > As for the JD memory requirement, the JD driver should not contain any of
> > the analytic pipeline. Its purpose is normally to send a reference to the
> > input data to the Job Processes which will read the input data, process
> it
> > and write results. (This is described at
> > http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1 )
> >
> > It should be possible for you to take *just* the collection reader
> > component from the cTAKES pipeline and use that for the JobDriver.
> > Hopefully this would need much less than Xmx400,
> >
> > Regards,
> > Eddie
> >
> >
> > On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <se...@gmail.com>
> wrote:
> >
> > > Hi Eddie,
> > >
> > > Thanks so much for taking the time to look at my issue and for your
> > reply.
> > >
> > > The reason I had to increase the heap size for the JD is because I'm
> > > running cTAKES (http://ctakes.apache.org/) with DUCC.  The increased
> > heap
> > > size is to accommodate loading all the models from cTAKES into memory.
> > > Before, when I didn't increase the memory size, DUCC would cancel the
> > > driver and ends.  cTAKES would return back the error of
> > > "java.lang.OutOfMemoryError: Java heap space”.
> > >
> > > Would you say that this problem is mainly a limitation of my physical
> > > memory and processes that are running on my computer or can it be
> > adjusted
> > > in DUCC, like making parameter adjustments so I can use an increased
> heap
> > > size or maybe a way to pre-allocate enough memory to be used by DUCC?
> > >
> > > Thanks again,
> > > Selina
> > >
> > >
> > > On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <ea...@gmail.com>
> > wrote:
> > >
> > > > Hi Selina,
> > > >
> > > > I suspect that the problem is due to the following job parameter:
> > > >       driver_jvm_args                -Xmx4g
> > > >
> > > > This would certainly be true if cgroups have been configured on for
> > DUCC.
> > > > The default cgroup size for a JD is 450MB, so specifying an Xmx of
> 4GB
> > > can
> > > > cause the JVM to spill into swap space and cause erratic behavior.
> > > >
> > > > Comparing a "fast" job (96) vs "slow" job (97), the time to process
> the
> > > > single work item was 8 sec vs 9 sec:
> > > >    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
> > > > workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
> > > > vs
> > > >    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
> > > > workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00
> > > >
> > > > The extra delays between the two jobs appear associated with the Job
> > > > Driver.
> > > >
> > > > Was there some reason you specified heap size for the JD? The default
> > JD
> > > > heap size is Xmx400m.
> > > >
> > > > Regards,
> > > > Eddie
> > > >
> > > >
> > > >
> > > > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <se...@gmail.com>
> > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I’m kind of new to DUCC and this forum.  I was hoping to see if
> > someone
> > > > > could give me some insights as to why DUCC is behaving strangely
> and
> > a
> > > > bit
> > > > > unstable.
> > > > >
> > > > > So what I'm trying to do is: I’m using DUCC to process a cTAKES
> job.
> > > > > Currently DUCC is just using a single node.  DUCC seems to act
> > randomly
> > > > in
> > > > > processing the jobs, varying between 4.5 minutes to 23 minutes,
> and I
> > > > > wasn’t running anything else that is CPU intensive. When I don’t
> use
> > > DUCC
> > > > > and use cTAKES alone, the times for processing are pretty
> consistent.
> > > > >
> > > > > To demonstrate this strange behavior in DUCC, I submitted the exact
> > > same
> > > > > job 10 times in a row (job ID 95-104), without modification to the
> > > > > settings.
> > > > > The duration for finishing each of the jobs are: 4:41, 4:43, 12:48,
> > > 8:41,
> > > > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The
> first
> > 9
> > > > jobs
> > > > > were completed and the last one got canceled.  Even before the last
> > > job,
> > > > > the first 9 jobs were varying in duration times.
> > > > > After restarting DUCC a couple of times and resetting it, I
> submitted
> > > the
> > > > > same job (job ID 110), that job was completed without a problem
> (long
> > > > > processing time)
> > > > >
> > > > > I noticed that when a job takes a long time to finish, past 5
> > minutes,
> > > it
> > > > > seemed to be stuck at the “initializing” and “completing” states
> for
> > > the
> > > > > longest.
> > > > >
> > > > > It seems like DUCC is doing something randomly.  I tried examining
> > the
> > > > log
> > > > > files, but they are all similar, except for the time between each
> > > state.
> > > > > (I’ve also placed the related logs and job file in a repo
> > > > > https://github.com/selinachu/Templogs, in case anyone is
> interested
> > in
> > > > > examining them.)
> > > > >
> > > > > I’m baffled with the random behaviors from DUCC. I was hoping maybe
> > > > someone
> > > > > could clarify this more for me.
> > > > >
> > > > > After completing a job, what does DUCC do? Does it save something
> in
> > > > > memory, which carries over to the next job, which probably relates
> to
> > > the
> > > > > initialization process?  Are there some parameter settings that
> might
> > > > > alleviate this type of behavior?
> > > > >
> > > > > I would appreciate any insight.  Thanks in advance for your help.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Selina Chu
> > > > >
> > > >
> > >
> >
>

Re: Random processing time with DUCC

Posted by Selina Chu <se...@gmail.com>.
Hi Eddie,

Thanks for the pointer about not putting the analytic pipeline in the JD
driver.  It seems like we’ve misunderstood the use of it. We’ll look into
modifying it so that the JD driver contains only the collection reader
component. Hopefully cTAKES will let us do so.

As suggested, I restarted DUCC and ran the same job once.  The agent.log
file is quite big.  So I’ve placed it in a repo, along with others related
logs in here:
https://github.com/selinachu/Templogs/tree/master/NewLogs_Mar11

I noticed that the agent log indicated many rogue processes.  Would it be
helpful to modify the settings in ducc.properties to clean up these
processes?

Thanks again for your help.

Cheers,
Selina


On Thu, Mar 10, 2016 at 10:30 AM, Eddie Epstein <ea...@gmail.com> wrote:

> Hi,
>
> DUCC has some logfiles that show more details of the machine and the job
> which would allow us to answer your questions about machine physical
> resources. These are located in $DUCC_HOME/logs, and in particular the
> agent log would be very helpful. The logfile name is {machine
> name}.{domain}.agent.log
> Please restart ducc so we can see the log from agent startup thru running
> the job one time.
>
> As for the JD memory requirement, the JD driver should not contain any of
> the analytic pipeline. Its purpose is normally to send a reference to the
> input data to the Job Processes which will read the input data, process it
> and write results. (This is described at
> http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1 )
>
> It should be possible for you to take *just* the collection reader
> component from the cTAKES pipeline and use that for the JobDriver.
> Hopefully this would need much less than Xmx400,
>
> Regards,
> Eddie
>
>
> On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <se...@gmail.com> wrote:
>
> > Hi Eddie,
> >
> > Thanks so much for taking the time to look at my issue and for your
> reply.
> >
> > The reason I had to increase the heap size for the JD is because I'm
> > running cTAKES (http://ctakes.apache.org/) with DUCC.  The increased
> heap
> > size is to accommodate loading all the models from cTAKES into memory.
> > Before, when I didn't increase the memory size, DUCC would cancel the
> > driver and ends.  cTAKES would return back the error of
> > "java.lang.OutOfMemoryError: Java heap space”.
> >
> > Would you say that this problem is mainly a limitation of my physical
> > memory and processes that are running on my computer or can it be
> adjusted
> > in DUCC, like making parameter adjustments so I can use an increased heap
> > size or maybe a way to pre-allocate enough memory to be used by DUCC?
> >
> > Thanks again,
> > Selina
> >
> >
> > On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <ea...@gmail.com>
> wrote:
> >
> > > Hi Selina,
> > >
> > > I suspect that the problem is due to the following job parameter:
> > >       driver_jvm_args                -Xmx4g
> > >
> > > This would certainly be true if cgroups have been configured on for
> DUCC.
> > > The default cgroup size for a JD is 450MB, so specifying an Xmx of 4GB
> > can
> > > cause the JVM to spill into swap space and cause erratic behavior.
> > >
> > > Comparing a "fast" job (96) vs "slow" job (97), the time to process the
> > > single work item was 8 sec vs 9 sec:
> > >    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
> > > workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
> > > vs
> > >    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
> > > workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00
> > >
> > > The extra delays between the two jobs appear associated with the Job
> > > Driver.
> > >
> > > Was there some reason you specified heap size for the JD? The default
> JD
> > > heap size is Xmx400m.
> > >
> > > Regards,
> > > Eddie
> > >
> > >
> > >
> > > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <se...@gmail.com>
> wrote:
> > >
> > > > Hi
> > > >
> > > > I’m kind of new to DUCC and this forum.  I was hoping to see if
> someone
> > > > could give me some insights as to why DUCC is behaving strangely and
> a
> > > bit
> > > > unstable.
> > > >
> > > > So what I'm trying to do is: I’m using DUCC to process a cTAKES job.
> > > > Currently DUCC is just using a single node.  DUCC seems to act
> randomly
> > > in
> > > > processing the jobs, varying between 4.5 minutes to 23 minutes, and I
> > > > wasn’t running anything else that is CPU intensive. When I don’t use
> > DUCC
> > > > and use cTAKES alone, the times for processing are pretty consistent.
> > > >
> > > > To demonstrate this strange behavior in DUCC, I submitted the exact
> > same
> > > > job 10 times in a row (job ID 95-104), without modification to the
> > > > settings.
> > > > The duration for finishing each of the jobs are: 4:41, 4:43, 12:48,
> > 8:41,
> > > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The first
> 9
> > > jobs
> > > > were completed and the last one got canceled.  Even before the last
> > job,
> > > > the first 9 jobs were varying in duration times.
> > > > After restarting DUCC a couple of times and resetting it, I submitted
> > the
> > > > same job (job ID 110), that job was completed without a problem (long
> > > > processing time)
> > > >
> > > > I noticed that when a job takes a long time to finish, past 5
> minutes,
> > it
> > > > seemed to be stuck at the “initializing” and “completing” states for
> > the
> > > > longest.
> > > >
> > > > It seems like DUCC is doing something randomly.  I tried examining
> the
> > > log
> > > > files, but they are all similar, except for the time between each
> > state.
> > > > (I’ve also placed the related logs and job file in a repo
> > > > https://github.com/selinachu/Templogs, in case anyone is interested
> in
> > > > examining them.)
> > > >
> > > > I’m baffled with the random behaviors from DUCC. I was hoping maybe
> > > someone
> > > > could clarify this more for me.
> > > >
> > > > After completing a job, what does DUCC do? Does it save something in
> > > > memory, which carries over to the next job, which probably relates to
> > the
> > > > initialization process?  Are there some parameter settings that might
> > > > alleviate this type of behavior?
> > > >
> > > > I would appreciate any insight.  Thanks in advance for your help.
> > > >
> > > >
> > > > Cheers,
> > > > Selina Chu
> > > >
> > >
> >
>

Re: Random processing time with DUCC

Posted by Eddie Epstein <ea...@gmail.com>.
Hi,

DUCC has some logfiles that show more details of the machine and the job
which would allow us to answer your questions about machine physical
resources. These are located in $DUCC_HOME/logs, and in particular the
agent log would be very helpful. The logfile name is {machine
name}.{domain}.agent.log
Please restart ducc so we can see the log from agent startup thru running
the job one time.

As for the JD memory requirement, the JD driver should not contain any of
the analytic pipeline. Its purpose is normally to send a reference to the
input data to the Job Processes which will read the input data, process it
and write results. (This is described at
http://uima.apache.org/d/uima-ducc-2.0.0/duccbook.html#x1-1600008.1 )

It should be possible for you to take *just* the collection reader
component from the cTAKES pipeline and use that for the JobDriver.
Hopefully this would need much less than Xmx400,

Regards,
Eddie


On Thu, Mar 10, 2016 at 12:07 PM, Selina Chu <se...@gmail.com> wrote:

> Hi Eddie,
>
> Thanks so much for taking the time to look at my issue and for your reply.
>
> The reason I had to increase the heap size for the JD is because I'm
> running cTAKES (http://ctakes.apache.org/) with DUCC.  The increased heap
> size is to accommodate loading all the models from cTAKES into memory.
> Before, when I didn't increase the memory size, DUCC would cancel the
> driver and ends.  cTAKES would return back the error of
> "java.lang.OutOfMemoryError: Java heap space”.
>
> Would you say that this problem is mainly a limitation of my physical
> memory and processes that are running on my computer or can it be adjusted
> in DUCC, like making parameter adjustments so I can use an increased heap
> size or maybe a way to pre-allocate enough memory to be used by DUCC?
>
> Thanks again,
> Selina
>
>
> On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <ea...@gmail.com> wrote:
>
> > Hi Selina,
> >
> > I suspect that the problem is due to the following job parameter:
> >       driver_jvm_args                -Xmx4g
> >
> > This would certainly be true if cgroups have been configured on for DUCC.
> > The default cgroup size for a JD is 450MB, so specifying an Xmx of 4GB
> can
> > cause the JVM to spill into swap space and cause erratic behavior.
> >
> > Comparing a "fast" job (96) vs "slow" job (97), the time to process the
> > single work item was 8 sec vs 9 sec:
> >    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
> > workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
> > vs
> >    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
> > workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00
> >
> > The extra delays between the two jobs appear associated with the Job
> > Driver.
> >
> > Was there some reason you specified heap size for the JD? The default JD
> > heap size is Xmx400m.
> >
> > Regards,
> > Eddie
> >
> >
> >
> > On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <se...@gmail.com> wrote:
> >
> > > Hi
> > >
> > > I’m kind of new to DUCC and this forum.  I was hoping to see if someone
> > > could give me some insights as to why DUCC is behaving strangely and a
> > bit
> > > unstable.
> > >
> > > So what I'm trying to do is: I’m using DUCC to process a cTAKES job.
> > > Currently DUCC is just using a single node.  DUCC seems to act randomly
> > in
> > > processing the jobs, varying between 4.5 minutes to 23 minutes, and I
> > > wasn’t running anything else that is CPU intensive. When I don’t use
> DUCC
> > > and use cTAKES alone, the times for processing are pretty consistent.
> > >
> > > To demonstrate this strange behavior in DUCC, I submitted the exact
> same
> > > job 10 times in a row (job ID 95-104), without modification to the
> > > settings.
> > > The duration for finishing each of the jobs are: 4:41, 4:43, 12:48,
> 8:41,
> > > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The first 9
> > jobs
> > > were completed and the last one got canceled.  Even before the last
> job,
> > > the first 9 jobs were varying in duration times.
> > > After restarting DUCC a couple of times and resetting it, I submitted
> the
> > > same job (job ID 110), that job was completed without a problem (long
> > > processing time)
> > >
> > > I noticed that when a job takes a long time to finish, past 5 minutes,
> it
> > > seemed to be stuck at the “initializing” and “completing” states for
> the
> > > longest.
> > >
> > > It seems like DUCC is doing something randomly.  I tried examining the
> > log
> > > files, but they are all similar, except for the time between each
> state.
> > > (I’ve also placed the related logs and job file in a repo
> > > https://github.com/selinachu/Templogs, in case anyone is interested in
> > > examining them.)
> > >
> > > I’m baffled with the random behaviors from DUCC. I was hoping maybe
> > someone
> > > could clarify this more for me.
> > >
> > > After completing a job, what does DUCC do? Does it save something in
> > > memory, which carries over to the next job, which probably relates to
> the
> > > initialization process?  Are there some parameter settings that might
> > > alleviate this type of behavior?
> > >
> > > I would appreciate any insight.  Thanks in advance for your help.
> > >
> > >
> > > Cheers,
> > > Selina Chu
> > >
> >
>

Re: Random processing time with DUCC

Posted by Selina Chu <se...@gmail.com>.
Hi Eddie,

Thanks so much for taking the time to look at my issue and for your reply.

The reason I had to increase the heap size for the JD is because I'm
running cTAKES (http://ctakes.apache.org/) with DUCC.  The increased heap
size is to accommodate loading all the models from cTAKES into memory.
Before, when I didn't increase the memory size, DUCC would cancel the
driver and ends.  cTAKES would return back the error of
"java.lang.OutOfMemoryError: Java heap space”.

Would you say that this problem is mainly a limitation of my physical
memory and processes that are running on my computer or can it be adjusted
in DUCC, like making parameter adjustments so I can use an increased heap
size or maybe a way to pre-allocate enough memory to be used by DUCC?

Thanks again,
Selina


On Wed, Mar 9, 2016 at 7:35 PM, Eddie Epstein <ea...@gmail.com> wrote:

> Hi Selina,
>
> I suspect that the problem is due to the following job parameter:
>       driver_jvm_args                -Xmx4g
>
> This would certainly be true if cgroups have been configured on for DUCC.
> The default cgroup size for a JD is 450MB, so specifying an Xmx of 4GB can
> cause the JVM to spill into swap space and cause erratic behavior.
>
> Comparing a "fast" job (96) vs "slow" job (97), the time to process the
> single work item was 8 sec vs 9 sec:
>    09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
> workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
> vs
>    09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
> workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00
>
> The extra delays between the two jobs appear associated with the Job
> Driver.
>
> Was there some reason you specified heap size for the JD? The default JD
> heap size is Xmx400m.
>
> Regards,
> Eddie
>
>
>
> On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <se...@gmail.com> wrote:
>
> > Hi
> >
> > I’m kind of new to DUCC and this forum.  I was hoping to see if someone
> > could give me some insights as to why DUCC is behaving strangely and a
> bit
> > unstable.
> >
> > So what I'm trying to do is: I’m using DUCC to process a cTAKES job.
> > Currently DUCC is just using a single node.  DUCC seems to act randomly
> in
> > processing the jobs, varying between 4.5 minutes to 23 minutes, and I
> > wasn’t running anything else that is CPU intensive. When I don’t use DUCC
> > and use cTAKES alone, the times for processing are pretty consistent.
> >
> > To demonstrate this strange behavior in DUCC, I submitted the exact same
> > job 10 times in a row (job ID 95-104), without modification to the
> > settings.
> > The duration for finishing each of the jobs are: 4:41, 4:43, 12:48, 8:41,
> > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The first 9
> jobs
> > were completed and the last one got canceled.  Even before the last job,
> > the first 9 jobs were varying in duration times.
> > After restarting DUCC a couple of times and resetting it, I submitted the
> > same job (job ID 110), that job was completed without a problem (long
> > processing time)
> >
> > I noticed that when a job takes a long time to finish, past 5 minutes, it
> > seemed to be stuck at the “initializing” and “completing” states for the
> > longest.
> >
> > It seems like DUCC is doing something randomly.  I tried examining the
> log
> > files, but they are all similar, except for the time between each state.
> > (I’ve also placed the related logs and job file in a repo
> > https://github.com/selinachu/Templogs, in case anyone is interested in
> > examining them.)
> >
> > I’m baffled with the random behaviors from DUCC. I was hoping maybe
> someone
> > could clarify this more for me.
> >
> > After completing a job, what does DUCC do? Does it save something in
> > memory, which carries over to the next job, which probably relates to the
> > initialization process?  Are there some parameter settings that might
> > alleviate this type of behavior?
> >
> > I would appreciate any insight.  Thanks in advance for your help.
> >
> >
> > Cheers,
> > Selina Chu
> >
>

Re: Random processing time with DUCC

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Selina,

I suspect that the problem is due to the following job parameter:
      driver_jvm_args                -Xmx4g

This would certainly be true if cgroups have been configured on for DUCC.
The default cgroup size for a JD is 450MB, so specifying an Xmx of 4GB can
cause the JVM to spill into swap space and cause erratic behavior.

Comparing a "fast" job (96) vs "slow" job (97), the time to process the
single work item was 8 sec vs 9 sec:
   09 Mar 2016 08:46:08,556  INFO JobDriverHelper - T[20] summarize
workitem  statistics  [sec]  avg=8.14 min=8.14 max=8.14 stddev=.00
vs
   09 Mar 2016 08:56:46,583  INFO JobDriverHelper - T[19] summarize
workitem  statistics  [sec]  avg=9.41 min=9.41 max=9.41 stddev=.00

The extra delays between the two jobs appear associated with the Job Driver.

Was there some reason you specified heap size for the JD? The default JD
heap size is Xmx400m.

Regards,
Eddie



On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <se...@gmail.com> wrote:

> Hi
>
> I’m kind of new to DUCC and this forum.  I was hoping to see if someone
> could give me some insights as to why DUCC is behaving strangely and a bit
> unstable.
>
> So what I'm trying to do is: I’m using DUCC to process a cTAKES job.
> Currently DUCC is just using a single node.  DUCC seems to act randomly in
> processing the jobs, varying between 4.5 minutes to 23 minutes, and I
> wasn’t running anything else that is CPU intensive. When I don’t use DUCC
> and use cTAKES alone, the times for processing are pretty consistent.
>
> To demonstrate this strange behavior in DUCC, I submitted the exact same
> job 10 times in a row (job ID 95-104), without modification to the
> settings.
> The duration for finishing each of the jobs are: 4:41, 4:43, 12:48, 8:41,
> 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The first 9 jobs
> were completed and the last one got canceled.  Even before the last job,
> the first 9 jobs were varying in duration times.
> After restarting DUCC a couple of times and resetting it, I submitted the
> same job (job ID 110), that job was completed without a problem (long
> processing time)
>
> I noticed that when a job takes a long time to finish, past 5 minutes, it
> seemed to be stuck at the “initializing” and “completing” states for the
> longest.
>
> It seems like DUCC is doing something randomly.  I tried examining the log
> files, but they are all similar, except for the time between each state.
> (I’ve also placed the related logs and job file in a repo
> https://github.com/selinachu/Templogs, in case anyone is interested in
> examining them.)
>
> I’m baffled with the random behaviors from DUCC. I was hoping maybe someone
> could clarify this more for me.
>
> After completing a job, what does DUCC do? Does it save something in
> memory, which carries over to the next job, which probably relates to the
> initialization process?  Are there some parameter settings that might
> alleviate this type of behavior?
>
> I would appreciate any insight.  Thanks in advance for your help.
>
>
> Cheers,
> Selina Chu
>