You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Ted Pikul <te...@gmail.com> on 2018/06/14 19:11:21 UTC

cTAKESParser loading model on each request

Hi, i posted this on the user mailing list but figured it might be a better
fit here.

I’m using the cTAKESParser with Tika as documented here:
https://wiki.apache.org/tika/cTAKESParser
 (CTAKES 4.0.0, tika 1.18, AggregatePlainTextFastUMLS processor from cTAKES)

It appears that with the tika-server, on each request the cTAKES model is
getting loaded each time, adding an additional 7-10 seconds. Is this
intended behavior? I figured that once the cTAKES pipeline was loaded, it
would just be reset on each request rather than reloaded.

I do see in CTAKESContentHandler.java that the last step of the endDocument
method is to reset the CAS, so I figured it would happen here. Maybe it is
and I’ve got something else in my setup wrong.

Re: cTAKESParser loading model on each request

Posted by Tim Allison <ta...@apache.org>.
Please open an issue for this on our JIRA...sounds like a bug not to reuse.

I’m not familiar at all w ctakes. Are the AnalysisEngine and the JCas
threadsafe?  If they aren’t, we might create a pool for them as we do for
SAXParsers now...maybe?

On Thu, Jun 14, 2018 at 3:11 PM Ted Pikul <te...@gmail.com> wrote:

> Hi, i posted this on the user mailing list but figured it might be a better
> fit here.
>
> I’m using the cTAKESParser with Tika as documented here:
> https://wiki.apache.org/tika/cTAKESParser
>  (CTAKES 4.0.0, tika 1.18, AggregatePlainTextFastUMLS processor from
> cTAKES)
>
> It appears that with the tika-server, on each request the cTAKES model is
> getting loaded each time, adding an additional 7-10 seconds. Is this
> intended behavior? I figured that once the cTAKES pipeline was loaded, it
> would just be reset on each request rather than reloaded.
>
> I do see in CTAKESContentHandler.java that the last step of the endDocument
> method is to reset the CAS, so I figured it would happen here. Maybe it is
> and I’ve got something else in my setup wrong.
>