You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by PRANEESH KUMAR <pr...@gmail.com> on 2014/06/30 09:17:15 UTC
Getting IOException: Resetting to invalid mark while reseting the stream
Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while
reseting the stream passed.
IOException occurs mostly for parsing pdf, zip formats.
Code snipped that I have used is
try {
// I have set the stream as BufferedInputStream of some sample.pdf
stream.mark(Integer.MAX_VALUE);
Tika t = new Tika();
String content = t.parseToString(stream);
} finally {
if(stream!=null ) {
stream.reset();
}
}
Does anybody experience this case, whether this is a bug or behaviour.
Thanks,
Praneesh
Re: Getting IOException: Resetting to invalid mark while reseting the stream
Posted by PRANEESH KUMAR <pr...@gmail.com>.
>
> What kind of stream is underneath that though?
It internally uses ByteArrayInputStream.
Tika will normally consume all of the stream when it parses a file
But in my case the stream that is used for parsing is also used for some
other processing too.
Praneesh
Re: Getting IOException: Resetting to invalid mark while reseting
the stream
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
>> What kind of thing is the stream you're passing?
>
> I am passing BufferedInputStream
What kind of stream is underneath that though?
> As TikaInputStream not resetting the pos of the stream to zero for not all
> the document types, so I need to do stream reset.
Tika will normally consume all of the stream when it parses a file
Nick
Re: Getting IOException: Resetting to invalid mark while reseting the stream
Posted by PRANEESH KUMAR <pr...@gmail.com>.
Hi Nick,
Thank you,
What kind of thing is the stream you're passing?
> I am passing BufferedInputStream
Does your stream support marking? And does it support marking that much?
> Yes it is mark supported and marking the stream is not problem.
Also, you could consider wrapping it with a TikaInputStream, which handles
marking / buffering to files / etc if needed
> By default Tika parser uses AutoDetectParser which internally wraps the
> stream passed as TikaInputStream.
As TikaInputStream not resetting the pos of the stream to zero for not all
> the document types, so I need to do stream reset.
For eg: Stream is not reset for files types txt, xml, htm, sh , etc,.
Thanks,
Praneesh
On Mon, Jun 30, 2014 at 1:27 PM, Nick Burch <ap...@gagravarr.org> wrote:
> On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
>
>> Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark
>> while
>> reseting the stream passed.
>>
>
> What kind of thing is the stream you're passing?
>
> stream.mark(Integer.MAX_VALUE);
>>
>
> Does your stream support marking? And does it support marking that much?
>
> Also, you could consider wrapping it with a TikaInputStream, which handles
> marking / buffering to files / etc if needed
>
> Nick
>
Re: Getting IOException: Resetting to invalid mark while reseting
the stream
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
> Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while
> reseting the stream passed.
What kind of thing is the stream you're passing?
> stream.mark(Integer.MAX_VALUE);
Does your stream support marking? And does it support marking that much?
Also, you could consider wrapping it with a TikaInputStream, which handles
marking / buffering to files / etc if needed
Nick