You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Benedict Holland <be...@gmail.com> on 2017/12/22 17:26:59 UTC

Run an analysis engine after processing document collection?

Hello All,

I find myself in a strange situation. I have a content processing engine
working. I have N threads populating N CAS objects and running my pipeline.
Each CAS object gets 1 piece of data, like say a row in a database. Each
process is entirely independent and can run concurrently. I specifically
did not configure this pipeline as an aggregate process as I don't really
care when the events trigger since the CPE maintains the order of the
engines.

Now I want to add an analysis that will run over the aggregate output. For
example, I processed N texts using the CPE and now I want to run a TF-IDF
analysis over the entire corpora. The TF-IDF analysis should only run once
all documents are processed.

How would I go about doing this? Does this have to do with not allowing
multiple deployments?

Thanks,
~Ben