You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2015/05/25 21:32:17 UTC

[jira] [Resolved] (TIKA-1638) Make ExternalParser actually work

     [ https://issues.apache.org/jira/browse/TIKA-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved TIKA-1638.
-------------------------------------
    Resolution: Fixed

- I applied this patch in r1681640.

> Make ExternalParser actually work
> ---------------------------------
>
>                 Key: TIKA-1638
>                 URL: https://issues.apache.org/jira/browse/TIKA-1638
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.9
>
>         Attachments: TIKA-1638.Mattmann.052515.patch.txt
>
>
> Several issues in ExternalParser cause it to currently not function. They are enumerated below:
> * the class org.apache.tika.parser.external.CompositeExternalParser needs to be added to the META-INF/services/org.apache.tika.parser.Parser file
> * the ExternalParserConfigReader class incorrectly tokenizes the error check codes which use "," - the StringTokenizer used has a default delimiter set that doesn't include ","
> * the ExternalParserConfigReader does a check before adding Parsers in which it simply takes the given String command check and then wraps it in a String[]. This causes the check to fail if the command includes spaces in it (which most will, by its documentation, even). The command needs to be .split(" ") on whitespace in order for this to work and for ExternalParsers to actually be created and added.
> * the ExternalParser needs to split its command (similar to the ExternalParserConfigReader) if it includes whitespace (which most commands do) in order for the command to be successfully executed.
> * exception handling needs to be added to the exec command when running the external command.
> * any Threads started in e.g., extractMetadata, sendInput, etc., need to be started, and then joined, so that they actually finish and complete before moving on in the function. As it stands, metadata can be sometimes extracted, and sometimes not, b/c it's done by threads that aren't forced to actually complete before moving on, parsing, and returning.
> I have a patch which fixes all this. Forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)