You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Rob Tulloh (Closed) (JIRA)" <ji...@apache.org> on 2011/12/30 13:35:30 UTC

[jira] [Closed] (TIKA-835) TNEF parsing unstable

     [ https://issues.apache.org/jira/browse/TIKA-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Tulloh closed TIKA-835.
---------------------------

    Resolution: Won't Fix

moving to POI
                
> TNEF parsing unstable
> ---------------------
>
>                 Key: TIKA-835
>                 URL: https://issues.apache.org/jira/browse/TIKA-835
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: CentOS 4.x/5.x/6.x 
> Java 6
>            Reporter: Rob Tulloh
>
> We are seeing problems in Solr with tika throwing exceptions. Sometimes we see OOM like this:
> {noformat}
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>         at org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:50)
>         at org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>         at org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:129)
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:195)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> {noformat}
> Other times, we see errors like this one:
> {noformat}
> Caused by: org.apache.poi.util.LittleEndian$BufferUnderrunException: buffer underrun
>         at org.apache.poi.util.LittleEndian.readUShort(LittleEndian.java:302)
>         at org.apache.poi.hmef.attribute.TNEFAttribute.<init>(TNEFAttribute.java:53)
>         at org.apache.poi.hmef.attribute.TNEFAttribute.create(TNEFAttribute.java:76)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:74)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.process(HMEFMessage.java:98)
>         at org.apache.poi.hmef.HMEFMessage.<init>(HMEFMessage.java:63)
>         at org.apache.tika.parser.microsoft.TNEFParser.parse(TNEFParser.java:79)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>         ... 26 more
> {noformat}
> I am able to reproduce these failures with tika-app-1.0.jar. I am not able to share the content as the content is proprietary in nature. The OOM error is particularly problematic as it crashes Solr and causes our document indexing pipeline to get congested while it waits for Solr to restart. Please see also Solr ticket https://issues.apache.org/jira/browse/SOLR-2990 as this contains the original posting of the problem and some details of our environment where the tests are being performed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira