You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Ronan KERDUDOU - VirageGroup <rk...@viragegroup.com> on 2010/02/18 20:41:51 UTC

[BUG ?] MimeType "IOException: Stream closed" with VFS streams

Hi all,

 

I'm currently adding fulltext index to my product and face a little issue.

I use tika 0.6 and this is the trace :

Caused by: java.io.IOException: Stream closed

      at
java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145)

      at java.io.BufferedInputStream.reset(BufferedInputStream.java:414)

                at org.apache.tika.mime.MimeTypes.detect(MimeTypes.java:532)

 

I'm using VFS and the streams the FileObject returns is of class :

org.apache.commons.vfs.provider.DefaultFileContent$FileContentInputStream

 

this stream supports 'mark' but automatically closes itself when we reach
the last byte. so if the file is tiny (size<limit) the folowing code will
bug :

 

stream.mark(limit);

mimeTypes.detect(stream,metadata);

stream.reset();

 

It would be safer for this specific case if MimeTypes don't read the last
byte.

 

Can we solve this in Tika or do you think it's a VFS bug and i should tell
them instead of you ?

 

Other precision :

In my case, i'm using AutoDetectParser

To solve the issue, i actually add the folowing code before calling it :

stream = new BufferedInputStream(stream);

 

I had this idea when reading this in the AutoDetectParser.parse() :

        if (!stream.markSupported()) {

            stream = new BufferedInputStream(stream);

        }

 

Regards,

 

KERDUDOU Ronan

VIRAGE Group (France)

+33 2 53 55 10 22

 <ma...@viragegroup.com> rk@viragegroup.com

 <http://www.viragegroup.com/> www.viragegroup.com

 

 


Re: [BUG ?] MimeType "IOException: Stream closed" with VFS streams

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Feb 18, 2010 at 8:41 PM, Ronan KERDUDOU - VirageGroup
<rk...@viragegroup.com> wrote:
> Can we solve this in Tika or do you think it's a VFS bug and i should tell
> them instead of you ?

IMHO it's a VFS bug, a reset() call should restore the stream to the
state it was when mark() was called (assuming the limit wasn't
exceeded, etc.). Otherwise there is no way for a client to really rely
on the reset() method.

> To solve the issue, i actually add the folowing code before calling it :
>
> stream = new BufferedInputStream(stream);

Yep. BufferedInputStream does restore the stream state correctly on reset().

> I had this idea when reading this in the AutoDetectParser.parse() :
>
>        if (!stream.markSupported()) {
>            stream = new BufferedInputStream(stream);
>        }

Perhaps Tika should be more defensive and simply always wrap the
stream into a BufferedInputStream regardless of whether the original
stream claims to support the mark feature. This way we'd avoid the
trouble you encountered.

BR,

Jukka Zitting