You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Public Network Services <pu...@gmail.com> on 2013/03/28 12:56:19 UTC

Releasing TikaInputStream resources

Folks,

Is there any workaround for releasing the resources used by
TikaInputStream, when it wraps around a "normal" InputStream, WITHOUT
closing the stream?

For example, to correctly detect certain Office formats (e.g., .xlsx
files), one has to use a TikaInputStream. But what if the method doing the
detection can only be invoked via an InputStream which it cannot close? In
this occasion, the temporary files are left in the /tmp directory,
progressively filling up the hard disk.

Would it be difficult to, say, add a release() method in the
TikaInputStream class?

Re: Releasing TikaInputStream resources

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Mar 28, 2013 at 3:09 PM, Public Network Services
<pu...@gmail.com> wrote:
> Correct, but if a method uses an already instantiated TikaInputStream, it
> cannot get a handle of the temporary resources to dispose it.

Can you describe the scenario where you'd need to do something like this?

The code that instantiates the TikaInputStream should also take care
of disposing it properly. If that happens, you shouldn't experience
the filling up of the /tmp space that you described.

BR,

Jukka Zitting

Re: Releasing TikaInputStream resources

Posted by Public Network Services <pu...@gmail.com>.
Correct, but if a method uses an already instantiated TikaInputStream, it
cannot get a handle of the temporary resources to dispose it.

So, a release() method could just free the temporary resources internally,
without closing the stream (so that the wrapped in InputStream remains
open).

Alternatively, for simple scenarios, the release() method could use
hasFile()/getFile() and just delete the corresponding file, with an
additional file.deleteOnExit() call if the delete is not completed.


On Thu, Mar 28, 2013 at 5:51 AM, Jukka Zitting <ju...@gmail.com>wrote:

> Hi,
>
> On Thu, Mar 28, 2013 at 1:56 PM, Public Network Services
> <pu...@gmail.com> wrote:
> > Is there any workaround for releasing the resources used by
> TikaInputStream,
> > when it wraps around a "normal" InputStream, WITHOUT closing the stream?
>
> The TemporaryResources class [1] is designed for this purpose. See the
> javadocs of the TikaInputStream.get(InputStream, TemporaryResources)
> method [2] for instructions on how to use it.
>
> [1]
> http://tika.apache.org/1.3/api/org/apache/tika/io/TemporaryResources.html
> [2]
> http://tika.apache.org/1.3/api/org/apache/tika/io/TikaInputStream.html#get(java.io.InputStream
> ,
> org.apache.tika.io.TemporaryResources)
>
> BR,
>
> Jukka Zitting
>

Re: Releasing TikaInputStream resources

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Mar 28, 2013 at 1:56 PM, Public Network Services
<pu...@gmail.com> wrote:
> Is there any workaround for releasing the resources used by TikaInputStream,
> when it wraps around a "normal" InputStream, WITHOUT closing the stream?

The TemporaryResources class [1] is designed for this purpose. See the
javadocs of the TikaInputStream.get(InputStream, TemporaryResources)
method [2] for instructions on how to use it.

[1] http://tika.apache.org/1.3/api/org/apache/tika/io/TemporaryResources.html
[2] http://tika.apache.org/1.3/api/org/apache/tika/io/TikaInputStream.html#get(java.io.InputStream,
org.apache.tika.io.TemporaryResources)

BR,

Jukka Zitting