You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Femi (JIRA)" <ji...@apache.org> on 2015/11/17 17:26:11 UTC

[jira] [Created] (TIKA-1796) Issues with tika jar and Microsoft documents like doc.,ppt, xls etc

Femi created TIKA-1796:
--------------------------

             Summary: Issues with tika jar and Microsoft documents like doc.,ppt, xls etc
                 Key: TIKA-1796
                 URL: https://issues.apache.org/jira/browse/TIKA-1796
             Project: Tika
          Issue Type: Bug
    Affects Versions: 0.9
         Environment: UNIX server
            Reporter: Femi


We have had a problem with tika-app-0.9.jar when it comes to using Microsoft documents (we do not have issues with PDFs and images). It creates tika files which are held by our weblogic java process.

For example, if one runs the command :- lsof -p 27305|grep deleted

java      27305  oracle  330r      REG              253,1   295674         68 /tmp/apache-tika-5125182301796025972.tmp (deleted)
java      27305  oracle  334r      REG              253,1   272896         69 /tmp/apache-tika-8997882426533237375.tmp (deleted)
java      27305  oracle  335r      REG              253,1   295674         78 /tmp/apache-tika-5232377327199509251.tmp (deleted)
java      27305  oracle  336r      REG              253,1    45327         43 /tmp/apache-tika-6884061409786039638.tmp (deleted)
java      27305  oracle  339r      REG              253,1   272895         41 /tmp/apache-tika-6752501215118342524.tmp (deleted)
java      27305  oracle  340r      REG              253,1   272895         41 /tmp/apache-tika-6752501215118342524.tmp (deleted)
java      27305  oracle  341r      REG              253,1    45327         75 /tmp/apache-tika-7548218713808428132.tmp (deleted)

The above is a long list of held tika files from Microsoft docs in deleted state but they are still handled by the weblogic process.

The only way we can get these tika files closed or released is by restarting the weblogic server.

This cost us money as we had to stop the server to get rid of the tika files filling up our tmp folder.

We have had this issue for almost 3 years now. I have been researching on the web to see if there are solutions out there in an upgraded tika-jar but it seems there are none.

I was thinking it will be resolved in an upgraded jar file but it seems that is not the case.

Please is there any solution to this issue?

Regards,

Femi Balogun,
Application Support Engineer,




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)