You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Erik Fäßler <er...@uni-jena.de> on 2018/05/15 13:21:52 UTC

Batch Checkpoints with DUCC?

And another question concerning DUCC :-)

With my CPEs I use a lot the batchProcessingComplete() and collectionProcessingComplete() methods. I need them because I do a lot of database interactions where I need to send data in batches due to the overhead of network communication.
How is that handled in DUCC? The documentation does not talk about it, at least it not find anything.

Hints are appreciated.

Thanks!

Erik

Re: Batch Checkpoints with DUCC?

Posted by Eddie Epstein <ea...@gmail.com>.
Hi,

Yes, exactly. DUCC jobs that specify CM,AE, CC use a custom flow controller
that routes the WorkItem CAS as desired. By default the route is (CM,CC),
but this can be modified by the contents of the WorkItem feature structure
... http://uima.apache.org/d/uima-ducc-2.2.2/duccbook.html#x1-1930009.5.3

Eddie


On Wed, May 16, 2018 at 2:56 AM, Erik Fäßler <er...@uni-jena.de>
wrote:

> Hey Eddie, thanks again! :-)
>
> So the idea is that the work item is the CAS that the CR sent to the CM,
> right? The work item CAS consists of a list of artifacts which are output
> by the CM, processed by the pipeline and finally cached by the CC.
> Then, I can somehow (have to read this up) have the work item CAS sent to
> the CC as the effective “batch processing complete” signal.
>
> Is that correct?
>
> > On 15. May 2018, at 20:50, Eddie Epstein <ea...@gmail.com> wrote:
> >
> > Hi Erik,
> >
> > There is a brief discussion of this in the duccbook in section 9.3 ...
> > https://uima.apache.org/d/uima-ducc-2.2.2/duccbook.html#x1-1880009.3
> >
> > In particular, the 3rd option, "Flushing cached data". This assumes that
> > the batch of work to be flushed is represented by each workitem CAS.
> >
> > Regards,
> > Eddie
> >
> > On Tue, May 15, 2018 at 9:21 AM, Erik Fäßler <er...@uni-jena.de>
> > wrote:
> >
> >> And another question concerning DUCC :-)
> >>
> >> With my CPEs I use a lot the batchProcessingComplete() and
> >> collectionProcessingComplete() methods. I need them because I do a lot
> of
> >> database interactions where I need to send data in batches due to the
> >> overhead of network communication.
> >> How is that handled in DUCC? The documentation does not talk about it,
> at
> >> least it not find anything.
> >>
> >> Hints are appreciated.
> >>
> >> Thanks!
> >>
> >> Erik
>
>

Re: Batch Checkpoints with DUCC?

Posted by Erik Fäßler <er...@uni-jena.de>.
Hey Eddie, thanks again! :-)

So the idea is that the work item is the CAS that the CR sent to the CM, right? The work item CAS consists of a list of artifacts which are output by the CM, processed by the pipeline and finally cached by the CC.
Then, I can somehow (have to read this up) have the work item CAS sent to the CC as the effective “batch processing complete” signal.

Is that correct?

> On 15. May 2018, at 20:50, Eddie Epstein <ea...@gmail.com> wrote:
> 
> Hi Erik,
> 
> There is a brief discussion of this in the duccbook in section 9.3 ...
> https://uima.apache.org/d/uima-ducc-2.2.2/duccbook.html#x1-1880009.3
> 
> In particular, the 3rd option, "Flushing cached data". This assumes that
> the batch of work to be flushed is represented by each workitem CAS.
> 
> Regards,
> Eddie
> 
> On Tue, May 15, 2018 at 9:21 AM, Erik Fäßler <er...@uni-jena.de>
> wrote:
> 
>> And another question concerning DUCC :-)
>> 
>> With my CPEs I use a lot the batchProcessingComplete() and
>> collectionProcessingComplete() methods. I need them because I do a lot of
>> database interactions where I need to send data in batches due to the
>> overhead of network communication.
>> How is that handled in DUCC? The documentation does not talk about it, at
>> least it not find anything.
>> 
>> Hints are appreciated.
>> 
>> Thanks!
>> 
>> Erik


Re: Batch Checkpoints with DUCC?

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Erik,

There is a brief discussion of this in the duccbook in section 9.3 ...
https://uima.apache.org/d/uima-ducc-2.2.2/duccbook.html#x1-1880009.3

In particular, the 3rd option, "Flushing cached data". This assumes that
the batch of work to be flushed is represented by each workitem CAS.

Regards,
Eddie

On Tue, May 15, 2018 at 9:21 AM, Erik Fäßler <er...@uni-jena.de>
wrote:

> And another question concerning DUCC :-)
>
> With my CPEs I use a lot the batchProcessingComplete() and
> collectionProcessingComplete() methods. I need them because I do a lot of
> database interactions where I need to send data in batches due to the
> overhead of network communication.
> How is that handled in DUCC? The documentation does not talk about it, at
> least it not find anything.
>
> Hints are appreciated.
>
> Thanks!
>
> Erik