You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "K, Baraneetharan" <ba...@hp.com> on 2012/06/06 12:30:56 UTC

TikaInputStream customization

Can anyone pls let me know how to customize TikaInputStream to read only first 1000bytes from a given InputStream.

Regards,
Baranee

Re: TikaInputStream customization

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Jun 6, 2012 at 2:15 PM, Baranee <ba...@hp.com> wrote:
> Can u pls tell me how to use the beforeRead() method in TikaInputStream to
> set readlimit for reading bytes from a stream.

http://people.apache.org/~hossman/#xyproblem

Why do you want to use TikaInputStream like this?

BR,

Jukka Zitting

Re: TikaInputStream customization

Posted by Baranee <ba...@hp.com>.
Thanks Zukka for your reply.

Can u pls tell me how to use the beforeRead() method in TikaInputStream to
set readlimit for reading bytes from a stream.

Baranee

--
View this message in context: http://lucene.472066.n3.nabble.com/partial-file-parsing-tp3987724p3987956.html
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Re: TikaInputStream customization

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Jun 6, 2012 at 12:30 PM, K, Baraneetharan
<ba...@hp.com> wrote:
> Can anyone pls let me know how to customize TikaInputStream to read only first
> 1000bytes from a given InputStream.

You can use the BoundedInputStream [1] class from Commons IO:

    TikaInputStream.get(new BoundedInputStream(stream, 1000));

However, see the concern in TIKA-307 [2]. Passing a truncated stream
to Tika may produce unexpected results.

[1] http://commons.apache.org/io/api-release/org/apache/commons/io/input/BoundedInputStream.html
[2] https://issues.apache.org/jira/browse/TIKA-307

BR,

Jukka Zitting