You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2015/09/22 22:40:04 UTC

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

     [ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch updated TIKA-1739:
-----------------------------
    Comment: was deleted

(was: We explicitly don't let you set an {{AutoDetectParser}} in the config, it's something you have to choose to use, giving it the parser(s) you want used post-detection

In the non-cTAKES case, you get a Composite Parser that'll handle your formats (directly/explicitly/via Tika Config xml/via default Tika Config), then give that (perhaps implicitly) to {{AutoDetectParser}}. {{AutoDetectParser}} identifies the type of the document, then picks the right parser based on the type

In the cTAKES case, you get your chosen Composite Parser again, and give that to cTAKES (possibly via Tika Config xml, eg in the case above). You now create an {{AutoDetectParser}} as before, and give it cTAKES. {{AutoDetectParser}} identifies the type, then gives the document *with the type* to cTAKES, as cTAKES claims all the mime types. cTAKES then uses its child Composite Parser to have the real parsing done, based on the type that {{AutoDetectParser}} supplied to it. When that's done, cTAKES then decorates the output.

Or, if you know the type yourself, you give that to cTAKES, which gives it to the child Composite Parser for parsing, then decorates the result, with no {{AutoDetectParser}} needed)

> cTAKESParser doesn't work in 1.11
> ---------------------------------
>
>                 Key: TIKA-1739
>                 URL: https://issues.apache.org/jira/browse/TIKA-1739
>             Project: Tika
>          Issue Type: Bug
>          Components: parser, server
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.11
>
>         Attachments: TIKA-1739.patch
>
>
> Tika cTAKESParser integration doesn't work in 1.11. The parser is called, but blank metadata comes back:
> {noformat}
> curl -T test.txt -H "Content-Type: text/plain" http://localhost:9999/rmeta/text
> [{"Content-Type":"text/plain","X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ctakes.CTAKESParser","org.apache.tika.parser.EmptyParser"],"X-TIKA:parse_time_millis":"20371","ctakes:schema":"coveredText:start:end:ontologyConceptArr"}
> {noformat}
> [~gagravarr] I wonder if something that happened in TIKA-1653 broke it?
> http://svn.apache.org/viewvc?view=revision&revision=1684199
> [~gostep] can you help me look here?
> I'm working on https://github.com/chrismattmann/shangridocs/tree/convert-wicket which is where I first saw this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)