You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Eric Riebling <er...@cs.cmu.edu> on 2010/07/20 22:22:21 UTC

Suggestion for CPE stats

Although it's useful to know how many documents have been processed,
that figure is not nearly as useful as how many CHARACTERS have been
processed by a given CPE or set of components within a CPE.  Since,
if your documents are tiny, processing per document is much faster
than if they are huge. 

So I think it would be a great thing to include Characters Processed
in the stats window of the Performance Report.
-- 
Eric Riebling  GHC 6713,  LTI,   SCS,  CMU
412.268.9872   http://www.cs.cmu.edu/~er1k

Re: Suggestion for CPE stats

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Eric,

I'm not sure which, but one of the UIMA command line tools does report total
document size at the end of processing.

However, some problems with this suggestion. The UIMA framework pretty much
just moves CASes around without looking inside them. If it did look
inside, which
view would it look at? What about non text artifacts?

My answer would be to make this an application design issue. Have a CAS
consumer do the count and make it available at collection process complete.

Eddie

On Tue, Jul 20, 2010 at 4:22 PM, Eric Riebling <er...@cs.cmu.edu> wrote:
> Although it's useful to know how many documents have been processed,
> that figure is not nearly as useful as how many CHARACTERS have been
> processed by a given CPE or set of components within a CPE.  Since,
> if your documents are tiny, processing per document is much faster
> than if they are huge.
> So I think it would be a great thing to include Characters Processed
> in the stats window of the Performance Report.
> --
> Eric Riebling  GHC 6713,  LTI,   SCS,  CMU
> 412.268.9872   http://www.cs.cmu.edu/~er1k
>