You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by "Allison, Timothy B." <ta...@mitre.org> on 2015/12/16 13:19:24 UTC

[commons-io] TeeInputStream that ignores skip/reset?

All,
  Over on Tika, we'd like a DigestingInputStream that ignores skip/reset (unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an InputStream similar to TeeInputStream that ignores skip/reset, so that the Digester would only see the stream as if it were read sequentially without skip/reset?
  If we do reinvent the wheel, should we contribute this InputStream to commons-io as an alternate to TeeInputStream?
  Or, even more generally, are there other recommendations for handling this?  Thank you!

         Best,

                 Tim

[0] http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CDM2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.outlook.com%3E

RE: [commons-io] TeeInputStream that ignores skip/reset?

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Right, that's the use case.  In Tika, we have no control over what our dependencies are doing to the stream.  

The current implementation does a mark/reset for digesting then parsing... up to a certain limit, after which we cache to disk and then digest then parse the tmp file separately.  

The downside to this (TIKA-1701) is that for truncated zip/package files, the digester reads to the end of the stream for an embedded file and hits the zip exception and then the parser fails to extract the contents of as many files as it would have if it had just been parsing the file without the digester.

If skip/reset don't make any sense for a DigestingInputStream generally, I'll keep our modified TeeInputStream over in Tika land.

If there are other recommendations for handling this, let me know.

Thank you!

Best,

          Tim

-----Original Message-----
From: sebb [mailto:sebbaz@gmail.com] 
Sent: Wednesday, December 16, 2015 1:07 PM
To: Commons Users List <us...@commons.apache.org>
Subject: Re: [commons-io] TeeInputStream that ignores skip/reset?

I'm not sure what the use case for this is, apart from avoiding the bug in DigestingInputStream.
Which can be avoided by not using skip/reset.

I'm not sure that skip/reset make any sense for a DigestingInputStream anyway.


On 16 December 2015 at 12:19, Allison, Timothy B. <ta...@mitre.org> wrote:
> All,
>   Over on Tika, we'd like a DigestingInputStream that ignores skip/reset (unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an InputStream similar to TeeInputStream that ignores skip/reset, so that the Digester would only see the stream as if it were read sequentially without skip/reset?
>   If we do reinvent the wheel, should we contribute this InputStream to commons-io as an alternate to TeeInputStream?
>   Or, even more generally, are there other recommendations for handling this?  Thank you!
>
>          Best,
>
>                  Tim
>
> [0] 
> http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CD
> M2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.ou
> tlook.com%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: [commons-io] TeeInputStream that ignores skip/reset?

Posted by sebb <se...@gmail.com>.
I'm not sure what the use case for this is, apart from avoiding the
bug in DigestingInputStream.
Which can be avoided by not using skip/reset.

I'm not sure that skip/reset make any sense for a DigestingInputStream anyway.


On 16 December 2015 at 12:19, Allison, Timothy B. <ta...@mitre.org> wrote:
> All,
>   Over on Tika, we'd like a DigestingInputStream that ignores skip/reset (unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an InputStream similar to TeeInputStream that ignores skip/reset, so that the Digester would only see the stream as if it were read sequentially without skip/reset?
>   If we do reinvent the wheel, should we contribute this InputStream to commons-io as an alternate to TeeInputStream?
>   Or, even more generally, are there other recommendations for handling this?  Thank you!
>
>          Best,
>
>                  Tim
>
> [0] http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CDM2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.outlook.com%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org