You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2020/03/18 11:03:53 UTC

Any experience running UIMA DUCC on container/cloud infrastructures

Hi all,

does anybody have experience to share of running UIMA DUCC on a container-based and/or cloud-based infrastructure?

I found a third-party project which helps setting up a DUCC cluster with docker:

  https://github.com/aleksey-hariton/uima-ducc-docker

Are there more relevant resources or experiences you can share?

Is anybody running DUCC on AWS or Kubernetes or similar platforms?

Does DUCC run there "out of the box" or is customization/plumbing required? If so, how much?

Looking forward to your stories!

-- Richard

Re: Any experience running UIMA DUCC on container/cloud infrastructures

Posted by Marshall Schor <ms...@schor.com>.
I've added this info and the link to the github repo posted above, to a new
UIMA Website page called ducc-cloud: 

https://uima.apache.org/ducc-cloud

I hope others will augment this "good start" at helping others with deploying
UIMA-DUCC in cloud environments.

-Marshall

On 3/18/2020 10:30 AM, Eddie Epstein wrote:
> Hi Richard,
>
> We've been running DUCC in a cloud environment for almost a year now. The
> DUCC master and a glusterfs servers run on bare metal and all of the
> workstations and worker machines run on VMs. Cluster users add VMs to the
> cluster as needed. A job can be started on one more workers and then
> additional VMs dynamically added to which the job will automatically scale
> out to use. A common system image is maintained on all VM machines via an
> LDAP server and shared filesystem data. Users belong to groups and share
> machines allocated by members of the group.
>
> A DUCC VM-image is used to automatically connect new VMs to the DUCC master
> and glusterfs. The DUCC master configuration may be updated anytime, for
> example to add new groups or even update master software. VMs automatically
> sync DUCC software and configuration each time they start their DUCC agent.
> The VM image supports three different machine types: a graphical
> workstation, a CPU worker and a GPU worker. DUCC spawns work on specified
> worker machine types and even specific machines. Workstations are optional
> as DUCC requests can be submitted from worker machines. Docker images are
> supported using Podman. Podman runs rootless and only allows access to all
> mounted file systems with user credentials.
>
> In order to keep some level of data security, a group directory is only
> mounted on the VMs created by members of the group. Individual users
> maintain file permissions as desired, but, as anyone that creates a VM has
> root access, they could become any other user and access data from other
> group members.There is a self-service glusterfs webapp that is used to
> export group data to new VMs and manage quotas.
>
> The VM-image builder and glusterfs webapp are not yet part of Apache DUCC.
> Not clear to me about running DUCC master and agent components in Docker
> containers. Can Kubernetes master and agent components run this way?
>
> Regards,
> Eddie
>
>
>
>
> On Wed, Mar 18, 2020 at 7:03 AM Richard Eckart de Castilho <re...@apache.org>
> wrote:
>
>> Hi all,
>>
>> does anybody have experience to share of running UIMA DUCC on a
>> container-based and/or cloud-based infrastructure?
>>
>> I found a third-party project which helps setting up a DUCC cluster with
>> docker:
>>
>>   https://github.com/aleksey-hariton/uima-ducc-docker
>>
>> Are there more relevant resources or experiences you can share?
>>
>> Is anybody running DUCC on AWS or Kubernetes or similar platforms?
>>
>> Does DUCC run there "out of the box" or is customization/plumbing
>> required? If so, how much?
>>
>> Looking forward to your stories!
>>
>> -- Richard

Re: Any experience running UIMA DUCC on container/cloud infrastructures

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Richard,

We've been running DUCC in a cloud environment for almost a year now. The
DUCC master and a glusterfs servers run on bare metal and all of the
workstations and worker machines run on VMs. Cluster users add VMs to the
cluster as needed. A job can be started on one more workers and then
additional VMs dynamically added to which the job will automatically scale
out to use. A common system image is maintained on all VM machines via an
LDAP server and shared filesystem data. Users belong to groups and share
machines allocated by members of the group.

A DUCC VM-image is used to automatically connect new VMs to the DUCC master
and glusterfs. The DUCC master configuration may be updated anytime, for
example to add new groups or even update master software. VMs automatically
sync DUCC software and configuration each time they start their DUCC agent.
The VM image supports three different machine types: a graphical
workstation, a CPU worker and a GPU worker. DUCC spawns work on specified
worker machine types and even specific machines. Workstations are optional
as DUCC requests can be submitted from worker machines. Docker images are
supported using Podman. Podman runs rootless and only allows access to all
mounted file systems with user credentials.

In order to keep some level of data security, a group directory is only
mounted on the VMs created by members of the group. Individual users
maintain file permissions as desired, but, as anyone that creates a VM has
root access, they could become any other user and access data from other
group members.There is a self-service glusterfs webapp that is used to
export group data to new VMs and manage quotas.

The VM-image builder and glusterfs webapp are not yet part of Apache DUCC.
Not clear to me about running DUCC master and agent components in Docker
containers. Can Kubernetes master and agent components run this way?

Regards,
Eddie




On Wed, Mar 18, 2020 at 7:03 AM Richard Eckart de Castilho <re...@apache.org>
wrote:

> Hi all,
>
> does anybody have experience to share of running UIMA DUCC on a
> container-based and/or cloud-based infrastructure?
>
> I found a third-party project which helps setting up a DUCC cluster with
> docker:
>
>   https://github.com/aleksey-hariton/uima-ducc-docker
>
> Are there more relevant resources or experiences you can share?
>
> Is anybody running DUCC on AWS or Kubernetes or similar platforms?
>
> Does DUCC run there "out of the box" or is customization/plumbing
> required? If so, how much?
>
> Looking forward to your stories!
>
> -- Richard