You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Simon <si...@teratext.saic.com.au> on 2012/10/15 03:34:28 UTC

Providing collection definitions to multiple CasMulipliers

Hi

I have an Aggregate Analysis Engine starting with a CasMultiplier that gets its
collection definition from the input CAS on the shared input queue.

If I wish to create many Aggregates where each CasMulitplier gets a different
collection definition from an input CAS, how can I make sure each new aggregate
CASMultipier gets a collection definition from an input CAS since all
CASMultipliers hang off the same input queue? The input CAS could be picked up
by any CASMultiplier, not the one I just created in an Aggregate.

Thanks
Simon


Re: Providing collection definitions to multiple CasMulipliers

Posted by Burn Lewis <bu...@gmail.com>.
With a single queue all AggAE instances deployed would have to be capable
of processing any request.  Perhaps you could include the directory as well
as the files to be processed in the input CAS.  If you want each deployed
AggAE to process only a certain type of work you'd have to hang them off
different queues and expect your client to send requests to the appropriate
queue.

~Burn

On Tue, Oct 16, 2012 at 9:15 AM, Eddie Epstein <ea...@gmail.com> wrote:

> Each collection definition (in a CAS) will be delivered to a single
> UIMA-AS service instance containing a CM and AggAE. Each service
> will then process its given input collection, and when done return
> the input CAS to the client/driver, freeing the service to process
> another input.
>
> So what exactly is the question?
>
> > If I wish to create many Aggregates...
> Does this mean many different types of Aggregates? If so, each
> different type service must use a different input queue.
>
> > ... how can I make sure each new aggregate
> > CASMultipier gets a collection definition from an input CAS since all
> > CASMultipliers hang off the same input queue?
> Does this mean how can you be sure that the work [input CASes]
> will be distributed across all available service instances? With the
> default UIMA-AS parameters they will be.
>
> Is there a different question?
>
> Eddie
>
>
> On Tue, Oct 16, 2012 at 12:26 AM, Simon <si...@teratext.saic.com.au>
> wrote:
> >> Eddie Epstein <ea...@...> writes:
> >>
> >> Is this a question about scaling out with UIMA-AS? Is the AggAE a
> service
> >> with a single CasMultiplier? Need more clarity to understand the
> scenario.
> >>
> >
> >
> > Hi
> > Yes it is a question about scaling out with UIMA-AS and yes the AggAE is
> > a service with a single CAsMultiplier.
> >
> > The CASMultiplier will get its collection definition (a directory path
> > to check for files) from a CAS on the input queue. But I would like to
> > have many instances of the AggAE deployed and checking different
> > directories for files.
> >
> > Thanks
> > Simon
> >
> >
>

Re: Providing collection definitions to multiple CasMulipliers

Posted by Eddie Epstein <ea...@gmail.com>.
On Tue, Oct 16, 2012 at 6:47 PM, Simon <si...@teratext.saic.com.au> wrote:
>
> I was hoping to deploy the same Aggregate many times and each Aggregate
> process files from a different directory. But I was wondering how to tell
> each Aggregate which directory to process, and not use config files to do
> this.
>
> Each Aggregate would have the same input queue, so it seems providing the
> directory path (in an input CAS) to a newly deployed aggregate is not
> possible since there may be already deployed identical aggregates with
> the same input queue.
>

If each input CAS contains a pointer to a different directory, then no two
aggregates are processing the same directory.

Re: Providing collection definitions to multiple CasMulipliers

Posted by Burn Lewis <bu...@gmail.com>.
On Tue, Oct 16, 2012 at 6:47 PM, Simon <si...@teratext.saic.com.au> wrote:

>
>
> I was hoping to deploy the same Aggregate many times and each Aggregate
> process files from a different directory. But I was wondering how to tell
> each Aggregate which directory to process, and not use config files to do
> this.
>
If each aggregate is reading from a different directory then they're not
really the same aggregate, so should be on different queues.

>
> Each Aggregate would have the same input queue, so it seems providing the
> directory path (in an input CAS) to a newly deployed aggregate is not
> possible since there may be already deployed identical aggregates with
> the same input queue.
>
I was guessing here that your work item was a directory and a bunch of
files in it, so any of the aggregates on the queue could do the work.
Having multiple aggregates on the same queue speeds things up as each can
be working on different pieces of work at the same time. Note that each
work item (CAS) is processed by only one of the aggregates.  Depending on
your work-load balance, dedicating each aggregate to a single directory
might be less efficient than allowing all aggregates to process any
directory.

>
> Thanks
> Simon
>
>
> If my guesses are off the mark please clarify with a small example of
files and directories.

~Burn

Re: Providing collection definitions to multiple CasMulipliers

Posted by Simon <si...@teratext.saic.com.au>.
> Each collection definition (in a CAS) will be delivered to a single
> UIMA-AS service instance containing a CM and AggAE. Each service
> will then process its given input collection, and when done return
> the input CAS to the client/driver, freeing the service to process
> another input.
> 
> So what exactly is the question?
> 
> > If I wish to create many Aggregates...
> Does this mean many different types of Aggregates? If so, each
> different type service must use a different input queue.
> 
> > ... how can I make sure each new aggregate
> > CASMultipier gets a collection definition from an input CAS since all
> > CASMultipliers hang off the same input queue?
> Does this mean how can you be sure that the work [input CASes]
> will be distributed across all available service instances? With the
> default UIMA-AS parameters they will be.
> 
> Is there a different question?

I was hoping to deploy the same Aggregate many times and each Aggregate
process files from a different directory. But I was wondering how to tell
each Aggregate which directory to process, and not use config files to do
this.

Each Aggregate would have the same input queue, so it seems providing the
directory path (in an input CAS) to a newly deployed aggregate is not
possible since there may be already deployed identical aggregates with
the same input queue.

Thanks
Simon



Re: Providing collection definitions to multiple CasMulipliers

Posted by Eddie Epstein <ea...@gmail.com>.
Each collection definition (in a CAS) will be delivered to a single
UIMA-AS service instance containing a CM and AggAE. Each service
will then process its given input collection, and when done return
the input CAS to the client/driver, freeing the service to process
another input.

So what exactly is the question?

> If I wish to create many Aggregates...
Does this mean many different types of Aggregates? If so, each
different type service must use a different input queue.

> ... how can I make sure each new aggregate
> CASMultipier gets a collection definition from an input CAS since all
> CASMultipliers hang off the same input queue?
Does this mean how can you be sure that the work [input CASes]
will be distributed across all available service instances? With the
default UIMA-AS parameters they will be.

Is there a different question?

Eddie


On Tue, Oct 16, 2012 at 12:26 AM, Simon <si...@teratext.saic.com.au> wrote:
>> Eddie Epstein <ea...@...> writes:
>>
>> Is this a question about scaling out with UIMA-AS? Is the AggAE a service
>> with a single CasMultiplier? Need more clarity to understand the scenario.
>>
>
>
> Hi
> Yes it is a question about scaling out with UIMA-AS and yes the AggAE is
> a service with a single CAsMultiplier.
>
> The CASMultiplier will get its collection definition (a directory path
> to check for files) from a CAS on the input queue. But I would like to
> have many instances of the AggAE deployed and checking different
> directories for files.
>
> Thanks
> Simon
>
>

Re: Providing collection definitions to multiple CasMulipliers

Posted by Simon <si...@teratext.saic.com.au>.
> Eddie Epstein <ea...@...> writes:
> 
> Is this a question about scaling out with UIMA-AS? Is the AggAE a service
> with a single CasMultiplier? Need more clarity to understand the scenario.
> 


Hi 
Yes it is a question about scaling out with UIMA-AS and yes the AggAE is
a service with a single CAsMultiplier.

The CASMultiplier will get its collection definition (a directory path
to check for files) from a CAS on the input queue. But I would like to
have many instances of the AggAE deployed and checking different
directories for files.

Thanks
Simon



Re: Providing collection definitions to multiple CasMulipliers

Posted by Eddie Epstein <ea...@gmail.com>.
Is this a question about scaling out with UIMA-AS? Is the AggAE a service
with a single CasMultiplier? Need more clarity to understand the scenario.

On Sun, Oct 14, 2012 at 9:34 PM, Simon <si...@teratext.saic.com.au> wrote:
> Hi
>
> I have an Aggregate Analysis Engine starting with a CasMultiplier that gets its
> collection definition from the input CAS on the shared input queue.
>
> If I wish to create many Aggregates where each CasMulitplier gets a different
> collection definition from an input CAS, how can I make sure each new aggregate
> CASMultipier gets a collection definition from an input CAS since all
> CASMultipliers hang off the same input queue? The input CAS could be picked up
> by any CASMultiplier, not the one I just created in an Aggregate.
>
> Thanks
> Simon
>