You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2015/06/11 02:48:00 UTC

[jira] [Commented] (TIKA-1654) Reset cTAKES CAS into CTAKESParser

    [ https://issues.apache.org/jira/browse/TIKA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581293#comment-14581293 ] 

Hudson commented on TIKA-1654:
------------------------------

SUCCESS: Integrated in tika-trunk-jdk1.7 #744 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/744/])
TIKA-1654 Reset cTAKES CAS into CTAKESParser (Fix for TIKA-1645) (totaro: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1684801)
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESContentHandler.java
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESParser.java
* /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ctakes/CTAKESUtils.java


> Reset cTAKES CAS into CTAKESParser
> ----------------------------------
>
>                 Key: TIKA-1654
>                 URL: https://issues.apache.org/jira/browse/TIKA-1654
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Giuseppe Totaro
>            Assignee: Giuseppe Totaro
>              Labels: patch
>             Fix For: 1.9
>
>         Attachments: TIKA-1654.patch
>
>
> Using [CTAKESParser from Tika Server|https://wiki.apache.org/tika/cTAKESParser], I noticed that an exception occurs when the CTAKESParser is used multiple times:
> {noformat}
> org.apache.uima.cas.CASRuntimeException: Data for Sofa feature setLocalSofaData() has already been set.
> {noformat}
> This is due to the CAS (Common Analysis System) used by CTAKESParser. The CAS, as the AE (AnalysisEngine), is a static field into CTAKESParser to make a sort of singleton.
> By the way, An Analysis Engine is a cTAKES/UIMA component responsible for analyzing unstructured information, discovering and representing semantic content. An AnalysisEngine operates on an "analysis structure" (implemented by CAS).
> It is highly recommended to reuse the CAS, but it has to be reset before the next run. The CTAKESUtils class ({{org.apache.tika.parser.ctakes}}) provides the reset method to release all resources held by both AnalysisEngine and CAS and then "destroy" them. This method prevents the CASRuntimeException error.
> You can find in attachment the patch including two new methods (resetCAS and resetAE) to reset, but not to destroy, the CAS and the AnalysisEngine respectively.
> By using only resetCAS, CTAKESParser can reuse both CAS and AE instead of building them again for each run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)