You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by PRANEESH KUMAR <pr...@gmail.com> on 2014/06/30 09:17:15 UTC

Getting IOException: Resetting to invalid mark while reseting the stream

Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while
reseting the stream passed.

IOException occurs mostly for parsing pdf, zip formats.

Code snipped that I have used is


try {

// I have set the stream as BufferedInputStream of some sample.pdf

stream.mark(Integer.MAX_VALUE);
Tika t = new Tika();

String content = t.parseToString(stream);
} finally {
if(stream!=null ) {
stream.reset();
}
}


Does anybody experience this case, whether this is a bug or behaviour.


Thanks,
Praneesh

Re: Getting IOException: Resetting to invalid mark while reseting the stream

Posted by PRANEESH KUMAR <pr...@gmail.com>.
>
> What kind of stream is underneath that though?


It internally uses ByteArrayInputStream.

Tika will normally consume all of the stream when it parses a file


But in my case the stream that is used for parsing is also used for some
other processing too.


Praneesh

Re: Getting IOException: Resetting to invalid mark while reseting the stream

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
>> What kind of thing is the stream you're passing?
>
> I am passing BufferedInputStream

What kind of stream is underneath that though?

> As TikaInputStream not resetting the pos of the stream to zero for not all
> the document types, so I need to do stream reset.

Tika will normally consume all of the stream when it parses a file

Nick

Re: Getting IOException: Resetting to invalid mark while reseting the stream

Posted by PRANEESH KUMAR <pr...@gmail.com>.
Hi Nick,

Thank you,


What kind of thing is the stream you're passing?

> I am passing BufferedInputStream


Does your stream support marking? And does it support marking that much?

> Yes it is mark supported and marking the stream is not problem.


Also, you could consider wrapping it with a TikaInputStream, which handles
marking / buffering to files / etc if needed

> By default Tika parser uses AutoDetectParser which internally wraps the
> stream passed as TikaInputStream.

As TikaInputStream not resetting the pos of the stream to zero for not all
> the document types, so I need to do stream reset.

For eg: Stream is not reset for files types txt, xml, htm, sh , etc,.



Thanks,
Praneesh





On Mon, Jun 30, 2014 at 1:27 PM, Nick Burch <ap...@gagravarr.org> wrote:

> On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
>
>> Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark
>> while
>> reseting the stream passed.
>>
>
> What kind of thing is the stream you're passing?
>
>  stream.mark(Integer.MAX_VALUE);
>>
>
> Does your stream support marking? And does it support marking that much?
>
> Also, you could consider wrapping it with a TikaInputStream, which handles
> marking / buffering to files / etc if needed
>
> Nick
>

Re: Getting IOException: Resetting to invalid mark while reseting the stream

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 30 Jun 2014, PRANEESH KUMAR wrote:
> Using Tika 1.5 getting java.io.IOException: Resetting to invalid mark while
> reseting the stream passed.

What kind of thing is the stream you're passing?

> stream.mark(Integer.MAX_VALUE);

Does your stream support marking? And does it support marking that much?

Also, you could consider wrapping it with a TikaInputStream, which handles 
marking / buffering to files / etc if needed

Nick