You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2011/04/21 18:06:05 UTC

[jira] [Commented] (TIKA-645) Parsers can't get at an underlying TikaInputStream to get the file if they wanted one

    [ https://issues.apache.org/jira/browse/TIKA-645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022803#comment-13022803 ] 

Nick Burch commented on TIKA-645:
---------------------------------

One solution that springs to mind is to place the hasFile and getFile methods onto an interface. TikaInputStream, TaggedInputStream and CountingInputStream could then all implement this. That way, if the underlying stream is a TikaInputStream, the parser can still find out and grab the file. If it isn't, then nothing changes.

> Parsers can't get at an underlying TikaInputStream to get the file if they wanted one
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-645
>                 URL: https://issues.apache.org/jira/browse/TIKA-645
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Nick Burch
>
> Spotted this with the office parser, but it should be general. The user creates a TikaInputStream, and passes that off to the parser framework. The Parser that is called may wish to spot that the input is a File backed TikaInputStream, and take a shortcut to use the file instead of the InputStream.
> However, what the parser gets is a TaggedInputStream wrapping a CountingInputStream wrapping the original TikaInputStream. As such, it can't get at the file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira