You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2011/09/22 07:53:26 UTC

[jira] [Resolved] (TIKA-709) Tika network server does not print anything in response to, for example, Word documents

     [ https://issues.apache.org/jira/browse/TIKA-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-709.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.10
         Assignee: Jukka Zitting

Good catch, thanks! This problem was caused by some of our parser classes closing the input stream even when the assumption in the Parser interface is that they shouldn't. I fixed the parsers in revisions 1173951 by making them use CloseShieldInputStream where appropriate.

> Tika network server does not print anything in response to, for example, Word documents
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-709
>                 URL: https://issues.apache.org/jira/browse/TIKA-709
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 0.9
>         Environment: Debian Linux Sid
>            Reporter: Vitaliy Filippov
>            Assignee: Jukka Zitting
>             Fix For: 0.10
>
>
> When trying to use Tika Server (java -jar tika-app-0.9.jar -t -p PORT) to parse M$Word DOC/DOCX files, tika server reads the file and then doesn't do anything more, it simply hangs, probably blocked on a socket read. This does not happend with, for example, HTML documents. I don't know the mechanics of this bug, but the following change definitely fixes the issue:
> Change
> type.process(socket.getInputStream(), output);
> to
> type.process(new CloseShieldInputStream(socket.getInputStream()), output);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira