You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/20 22:02:38 UTC

[jira] [Commented] (TIKA-1351) Parser implementations should accept null content handlers

    [ https://issues.apache.org/jira/browse/TIKA-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372099#comment-14372099 ] 

Tyler Palsulich commented on TIKA-1351:
---------------------------------------

I think this would be a nice feature. But, it's a large task to update every Parser. I believe there is a DummyContentHandler which just discards everything (but extraction is still done)? I forget the name.

> Parser implementations should accept null content handlers
> ----------------------------------------------------------
>
>                 Key: TIKA-1351
>                 URL: https://issues.apache.org/jira/browse/TIKA-1351
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Sergey Beryozkin
>            Priority: Minor
>
> Applications which want to let users search documents based only on their metadata do not need to get the content parsed. 
> The only workaround I've found so far is to pass a no op content handler which can ignore the content events but it does not stop the parser such as PDFParser from parsing the content.
> Proposal: update parser API docs to let implementers know ContentHandler can be null and update the shipped implementations to parse the metadata only if ContentHandler is null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)