You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Thanh Tam Nguyen <nt...@gmail.com> on 2014/11/05 09:11:52 UTC

Re: UIMA DUCC - Multi-machine Installation

Hi Eddie,
Thanks for your email. I followed the documentation and I was able to run
DUCC jobs using different user instead of user "ducc". But while I was
watching the webserver, I only found one machine running the jobs. In the
tab System>Machines, I can see all the machine statuses are "up". What
should I do to run the jobs on all machines?


Regards,
Tam

On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <ea...@gmail.com> wrote:

> Hi Tam,
>
> In the install documentation,
> http://uima.apache.org/d/uima-ducc-1.0.0/installation.html,
> the section "Multi-User Installation and Verification" describes how to
> configure setuid-root
> for ducc_ling so that DUCC jobs are run as the submitting user instead of
> user "ducc".
>
> The setuid-root ducc_ling should be put on every DUCC node, in the same
> place,
> and ducc.properties updated to point at that location.
>
> Eddie
>
>
> On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen <nt...@gmail.com>
> wrote:
>
> > Hi Eddie,
> > Would you tell me more details how to setup DUCC for multiuser mode?
> FYI, I
> > have successfully setup and ran my UIMA analysis engine on single user
> > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how to
> get
> > it worked on a cluster of machines.
> >
> > Thanks,
> > Tam
> >
> > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein <ea...@gmail.com>
> > wrote:
> >
> > > The $DUCC_RUNTIME tree needs to be on a shared filesystem accessible
> from
> > > all machines.
> > > For single user mode ducc_ling could be referenced from there as well.
> > > But for multiuser setup, ducc_ling needs setuid and should be installed
> > on
> > > the root drive.
> > >
> > > Eddie
> > >
> > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker <james.d.baker@gmail.com
> >
> > > wrote:
> > >
> > > > I've been working through the installation of UIMA DUCC, and have
> > > > successfully got it set up and running on a single machine. I'd now
> > like
> > > to
> > > > move to running it on a cluster of machines, but it isn't clear to me
> > > from
> > > > the installation guide as to whether I need to install DUCC on each
> > node,
> > > > or whether ducc_ling is the only thing that needs installing on the
> > > > non-head nodes.
> > > >
> > > > Could anyone shed some light on the process please?
> > > >
> > > > Thanks,
> > > > James
> > > >
> > >
> >
>

Re: UIMA DUCC - Multi-machine Installation

Posted by Lou DeGenaro <lo...@gmail.com>.
1. What is your process_thread_count for the job (found in the job
specification, and specifiable on submit command)?
2. What is your ducc.rm.share.quantum (specified in ducc.properties).

Say your quantum size allowed for 2 shares (aka "ducc job processes" or "JPs")
per node, and your thread count allowed for 8 threads per share.  In this
case, 16 work items could fit on a single node.

Say you have 4 nodes that each fit 2 shares.  By specifying a
process_thread_count of 1, you should then see 8 work items running in
parallel across the 4 nodes, presuming they are of sufficient duration.

Hope this helps.

Lou.


On Wed, Nov 5, 2014 at 8:55 PM, Thanh Tam Nguyen <nt...@gmail.com>
wrote:

> Hi Eddie,
> I've checked the webserver. Since I have been testing on a small collection
> of documents (20 documents), there were 15 work items for the job.
>
> Did you mean 500 work items per machine?
>
> Regards,
> Tam
>
> On Thu, Nov 6, 2014 at 1:20 AM, Eddie Epstein <ea...@gmail.com> wrote:
>
> > Hi,
> >
> > There is a default limit of 500 work items dispatched at the same time.
> How
> > many dispatched are shown for the job?
> >
> > Eddie
> >
> >
> > On Wed, Nov 5, 2014 at 3:11 AM, Thanh Tam Nguyen <nt...@gmail.com>
> > wrote:
> >
> > > Hi Eddie,
> > > Thanks for your email. I followed the documentation and I was able to
> run
> > > DUCC jobs using different user instead of user "ducc". But while I was
> > > watching the webserver, I only found one machine running the jobs. In
> the
> > > tab System>Machines, I can see all the machine statuses are "up". What
> > > should I do to run the jobs on all machines?
> > >
> > >
> > > Regards,
> > > Tam
> > >
> > > On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <ea...@gmail.com>
> > > wrote:
> > >
> > > > Hi Tam,
> > > >
> > > > In the install documentation,
> > > > http://uima.apache.org/d/uima-ducc-1.0.0/installation.html,
> > > > the section "Multi-User Installation and Verification" describes how
> to
> > > > configure setuid-root
> > > > for ducc_ling so that DUCC jobs are run as the submitting user
> instead
> > of
> > > > user "ducc".
> > > >
> > > > The setuid-root ducc_ling should be put on every DUCC node, in the
> same
> > > > place,
> > > > and ducc.properties updated to point at that location.
> > > >
> > > > Eddie
> > > >
> > > >
> > > > On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen <
> nthanhtam@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Eddie,
> > > > > Would you tell me more details how to setup DUCC for multiuser
> mode?
> > > > FYI, I
> > > > > have successfully setup and ran my UIMA analysis engine on single
> > user
> > > > > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how
> > to
> > > > get
> > > > > it worked on a cluster of machines.
> > > > >
> > > > > Thanks,
> > > > > Tam
> > > > >
> > > > > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein <
> eaepstein@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > The $DUCC_RUNTIME tree needs to be on a shared filesystem
> > accessible
> > > > from
> > > > > > all machines.
> > > > > > For single user mode ducc_ling could be referenced from there as
> > > well.
> > > > > > But for multiuser setup, ducc_ling needs setuid and should be
> > > installed
> > > > > on
> > > > > > the root drive.
> > > > > >
> > > > > > Eddie
> > > > > >
> > > > > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker <
> > > james.d.baker@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I've been working through the installation of UIMA DUCC, and
> have
> > > > > > > successfully got it set up and running on a single machine. I'd
> > now
> > > > > like
> > > > > > to
> > > > > > > move to running it on a cluster of machines, but it isn't clear
> > to
> > > me
> > > > > > from
> > > > > > > the installation guide as to whether I need to install DUCC on
> > each
> > > > > node,
> > > > > > > or whether ducc_ling is the only thing that needs installing on
> > the
> > > > > > > non-head nodes.
> > > > > > >
> > > > > > > Could anyone shed some light on the process please?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > James
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: UIMA DUCC - Multi-machine Installation

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Tam,

This limit is 500 for the job. It is due to the memory size of keeping a
copy of all active "work item" CASes in the JD.

This has not been a problem for our users because work items are not
individual documents. Rather they are groups of documents (or groups of
CASes) things like all files in a directory, or files containing many
documents. Then the CM (Cas Multiplier) running in each thread of each JP
(JobProcess) reads the data and creates CASes for each document (or input
CAS) to send down the pipeline. This also allows grouping the output
corresponding to the work item (e.g. for many documents) into a single
output file.

See the DUCC sample apps for an example of breaking a single text file into
many documents, and grouping all the output CASes for the documents in the
file into a single zipfile.

We are working on a change that will significantly increase the max number
of dispatched CASes.

Eddie



On Wed, Nov 5, 2014 at 8:55 PM, Thanh Tam Nguyen <nt...@gmail.com>
wrote:

> Hi Eddie,
> I've checked the webserver. Since I have been testing on a small collection
> of documents (20 documents), there were 15 work items for the job.
>
> Did you mean 500 work items per machine?
>
> Regards,
> Tam
>
> On Thu, Nov 6, 2014 at 1:20 AM, Eddie Epstein <ea...@gmail.com> wrote:
>
> > Hi,
> >
> > There is a default limit of 500 work items dispatched at the same time.
> How
> > many dispatched are shown for the job?
> >
> > Eddie
> >
> >
> > On Wed, Nov 5, 2014 at 3:11 AM, Thanh Tam Nguyen <nt...@gmail.com>
> > wrote:
> >
> > > Hi Eddie,
> > > Thanks for your email. I followed the documentation and I was able to
> run
> > > DUCC jobs using different user instead of user "ducc". But while I was
> > > watching the webserver, I only found one machine running the jobs. In
> the
> > > tab System>Machines, I can see all the machine statuses are "up". What
> > > should I do to run the jobs on all machines?
> > >
> > >
> > > Regards,
> > > Tam
> > >
> > > On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <ea...@gmail.com>
> > > wrote:
> > >
> > > > Hi Tam,
> > > >
> > > > In the install documentation,
> > > > http://uima.apache.org/d/uima-ducc-1.0.0/installation.html,
> > > > the section "Multi-User Installation and Verification" describes how
> to
> > > > configure setuid-root
> > > > for ducc_ling so that DUCC jobs are run as the submitting user
> instead
> > of
> > > > user "ducc".
> > > >
> > > > The setuid-root ducc_ling should be put on every DUCC node, in the
> same
> > > > place,
> > > > and ducc.properties updated to point at that location.
> > > >
> > > > Eddie
> > > >
> > > >
> > > > On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen <
> nthanhtam@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Eddie,
> > > > > Would you tell me more details how to setup DUCC for multiuser
> mode?
> > > > FYI, I
> > > > > have successfully setup and ran my UIMA analysis engine on single
> > user
> > > > > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how
> > to
> > > > get
> > > > > it worked on a cluster of machines.
> > > > >
> > > > > Thanks,
> > > > > Tam
> > > > >
> > > > > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein <
> eaepstein@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > The $DUCC_RUNTIME tree needs to be on a shared filesystem
> > accessible
> > > > from
> > > > > > all machines.
> > > > > > For single user mode ducc_ling could be referenced from there as
> > > well.
> > > > > > But for multiuser setup, ducc_ling needs setuid and should be
> > > installed
> > > > > on
> > > > > > the root drive.
> > > > > >
> > > > > > Eddie
> > > > > >
> > > > > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker <
> > > james.d.baker@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I've been working through the installation of UIMA DUCC, and
> have
> > > > > > > successfully got it set up and running on a single machine. I'd
> > now
> > > > > like
> > > > > > to
> > > > > > > move to running it on a cluster of machines, but it isn't clear
> > to
> > > me
> > > > > > from
> > > > > > > the installation guide as to whether I need to install DUCC on
> > each
> > > > > node,
> > > > > > > or whether ducc_ling is the only thing that needs installing on
> > the
> > > > > > > non-head nodes.
> > > > > > >
> > > > > > > Could anyone shed some light on the process please?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > James
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: UIMA DUCC - Multi-machine Installation

Posted by Thanh Tam Nguyen <nt...@gmail.com>.
Hi Eddie,
I've checked the webserver. Since I have been testing on a small collection
of documents (20 documents), there were 15 work items for the job.

Did you mean 500 work items per machine?

Regards,
Tam

On Thu, Nov 6, 2014 at 1:20 AM, Eddie Epstein <ea...@gmail.com> wrote:

> Hi,
>
> There is a default limit of 500 work items dispatched at the same time. How
> many dispatched are shown for the job?
>
> Eddie
>
>
> On Wed, Nov 5, 2014 at 3:11 AM, Thanh Tam Nguyen <nt...@gmail.com>
> wrote:
>
> > Hi Eddie,
> > Thanks for your email. I followed the documentation and I was able to run
> > DUCC jobs using different user instead of user "ducc". But while I was
> > watching the webserver, I only found one machine running the jobs. In the
> > tab System>Machines, I can see all the machine statuses are "up". What
> > should I do to run the jobs on all machines?
> >
> >
> > Regards,
> > Tam
> >
> > On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <ea...@gmail.com>
> > wrote:
> >
> > > Hi Tam,
> > >
> > > In the install documentation,
> > > http://uima.apache.org/d/uima-ducc-1.0.0/installation.html,
> > > the section "Multi-User Installation and Verification" describes how to
> > > configure setuid-root
> > > for ducc_ling so that DUCC jobs are run as the submitting user instead
> of
> > > user "ducc".
> > >
> > > The setuid-root ducc_ling should be put on every DUCC node, in the same
> > > place,
> > > and ducc.properties updated to point at that location.
> > >
> > > Eddie
> > >
> > >
> > > On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen <nthanhtam@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Eddie,
> > > > Would you tell me more details how to setup DUCC for multiuser mode?
> > > FYI, I
> > > > have successfully setup and ran my UIMA analysis engine on single
> user
> > > > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how
> to
> > > get
> > > > it worked on a cluster of machines.
> > > >
> > > > Thanks,
> > > > Tam
> > > >
> > > > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein <eaepstein@gmail.com
> >
> > > > wrote:
> > > >
> > > > > The $DUCC_RUNTIME tree needs to be on a shared filesystem
> accessible
> > > from
> > > > > all machines.
> > > > > For single user mode ducc_ling could be referenced from there as
> > well.
> > > > > But for multiuser setup, ducc_ling needs setuid and should be
> > installed
> > > > on
> > > > > the root drive.
> > > > >
> > > > > Eddie
> > > > >
> > > > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker <
> > james.d.baker@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I've been working through the installation of UIMA DUCC, and have
> > > > > > successfully got it set up and running on a single machine. I'd
> now
> > > > like
> > > > > to
> > > > > > move to running it on a cluster of machines, but it isn't clear
> to
> > me
> > > > > from
> > > > > > the installation guide as to whether I need to install DUCC on
> each
> > > > node,
> > > > > > or whether ducc_ling is the only thing that needs installing on
> the
> > > > > > non-head nodes.
> > > > > >
> > > > > > Could anyone shed some light on the process please?
> > > > > >
> > > > > > Thanks,
> > > > > > James
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: UIMA DUCC - Multi-machine Installation

Posted by Eddie Epstein <ea...@gmail.com>.
Hi,

There is a default limit of 500 work items dispatched at the same time. How
many dispatched are shown for the job?

Eddie


On Wed, Nov 5, 2014 at 3:11 AM, Thanh Tam Nguyen <nt...@gmail.com>
wrote:

> Hi Eddie,
> Thanks for your email. I followed the documentation and I was able to run
> DUCC jobs using different user instead of user "ducc". But while I was
> watching the webserver, I only found one machine running the jobs. In the
> tab System>Machines, I can see all the machine statuses are "up". What
> should I do to run the jobs on all machines?
>
>
> Regards,
> Tam
>
> On Fri, Oct 31, 2014 at 9:37 PM, Eddie Epstein <ea...@gmail.com>
> wrote:
>
> > Hi Tam,
> >
> > In the install documentation,
> > http://uima.apache.org/d/uima-ducc-1.0.0/installation.html,
> > the section "Multi-User Installation and Verification" describes how to
> > configure setuid-root
> > for ducc_ling so that DUCC jobs are run as the submitting user instead of
> > user "ducc".
> >
> > The setuid-root ducc_ling should be put on every DUCC node, in the same
> > place,
> > and ducc.properties updated to point at that location.
> >
> > Eddie
> >
> >
> > On Fri, Oct 31, 2014 at 3:54 AM, Thanh Tam Nguyen <nt...@gmail.com>
> > wrote:
> >
> > > Hi Eddie,
> > > Would you tell me more details how to setup DUCC for multiuser mode?
> > FYI, I
> > > have successfully setup and ran my UIMA analysis engine on single user
> > > mode. I also followed DUCCBOOK to setup ducc_ling but I am sure how to
> > get
> > > it worked on a cluster of machines.
> > >
> > > Thanks,
> > > Tam
> > >
> > > On Thu, Oct 30, 2014 at 11:08 PM, Eddie Epstein <ea...@gmail.com>
> > > wrote:
> > >
> > > > The $DUCC_RUNTIME tree needs to be on a shared filesystem accessible
> > from
> > > > all machines.
> > > > For single user mode ducc_ling could be referenced from there as
> well.
> > > > But for multiuser setup, ducc_ling needs setuid and should be
> installed
> > > on
> > > > the root drive.
> > > >
> > > > Eddie
> > > >
> > > > On Thu, Oct 30, 2014 at 10:08 AM, James Baker <
> james.d.baker@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I've been working through the installation of UIMA DUCC, and have
> > > > > successfully got it set up and running on a single machine. I'd now
> > > like
> > > > to
> > > > > move to running it on a cluster of machines, but it isn't clear to
> me
> > > > from
> > > > > the installation guide as to whether I need to install DUCC on each
> > > node,
> > > > > or whether ducc_ling is the only thing that needs installing on the
> > > > > non-head nodes.
> > > > >
> > > > > Could anyone shed some light on the process please?
> > > > >
> > > > > Thanks,
> > > > > James
> > > > >
> > > >
> > >
> >
>