You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ahmed Owian (JIRA)" <ji...@apache.org> on 2015/05/07 20:54:03 UTC

[jira] [Comment Edited] (TIKA-634) Command Line Parser for Metadata Extraction

    [ https://issues.apache.org/jira/browse/TIKA-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533184#comment-14533184 ] 

Ahmed Owian edited comment on TIKA-634 at 5/7/15 6:54 PM:
----------------------------------------------------------

I'm creating a unit test for ExternalParser with the intent to display the concurrency issue.  I started by using {{cat}}

Firstly, when not putting the output token, the parser puts the full output from standard out into the ContentHandler, but it doesn't parse the metadata from standard out.
When putting the output token, cat outputs an error to standard error, because the output token itself is never replaced with anything.  https://issues.apache.org/jira/browse/TIKA-1620 fixes that, but cat does not have an option for an output file other than the system redirects.  So by specifying both input and output files, cat just treats it like two input files.


was (Author: ahmedowian):
I'm creating a unit test for ExternalParser with the intent to display the concurrency issue.  I started by using {{cat}}

Firstly, when not putting the output token, the parser puts the full output from standard out into the ContentHandler, but it doesn't parse the metadata from standard out.
When putting the output token, the parser gives an error from cat, because the output token itself is never replaced with anything.  https://issues.apache.org/jira/browse/TIKA-1620 fixes that, but cat does not have an option for an output file other than the system redirects.

> Command Line Parser for Metadata Extraction
> -------------------------------------------
>
>                 Key: TIKA-634
>                 URL: https://issues.apache.org/jira/browse/TIKA-634
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>            Priority: Minor
>              Labels: new-parser
>
> As discussed on the mailing list:
> http://mail-archives.apache.org/mod_mbox/tika-dev/201104.mbox/%3Calpine.DEB.2.00.1104052028380.29085@urchin.earth.li%3E
> This issue is to track improvements in the ExternalParser support to handle metadata extraction, and probably easier configuration of an external parser too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)