You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/04/14 15:50:53 UTC

[jira] Resolved: (TIKA-397) Parser crashes on very simple file

     [ https://issues.apache.org/jira/browse/TIKA-397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-397.
--------------------------------

      Assignee: Jukka Zitting
    Resolution: Duplicate

This was fixed in Tika 0.5 as a side-effect of other changes. Solr trunk has already upgraded to a more recent Tika version (see SOLR-1819), so the fix will also be included in the next Solr release.

> Parser crashes on very simple file
> ----------------------------------
>
>                 Key: TIKA-397
>                 URL: https://issues.apache.org/jira/browse/TIKA-397
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.4
>         Environment: Solr 1.4 on Ubuntu 9.10.  OpenJDK Runtime Environment (IcedTea6 1.6.1) (6b16-1.6.1-3ubuntu1)
>            Reporter: Ross Keatinge
>            Assignee: Jukka Zitting
>
> Sorry but I can only talk about this from a Solr user's point of view. I'm using Solr's ExtractingRequestHandler (Solr Cell) to index some text files. In general it's working fine but Tika crashes when parsing a text file with with certain upper case short words near the start of the file. I haven't been able to discover the pattern of what works and what doesn't but here's a real simple example.
> A file with just the letters XE and nothing else crashes. If I edit the file and change it to any of XA, XB, XC, XD or XF it works but XE always crashes. Lower case works.
> I discovered this with certain five letter words that unfortunately are very common in my documents.
> Here's the error message from Solr.
> <html><head><title>Apache Tomcat/6.0.20 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 500 - org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
> 	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> 	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> 	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> 	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> 	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> 	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> 	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> 	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> 	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> 	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> 	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> 	at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
> 	... 18 more
> Caused by: java.lang.NullPointerException
> 	at java.io.Reader.&lt;init&gt;(Reader.java:78)
> 	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
> 	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:108)
> 	at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
> 	... 20 more
> </h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
> 	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> 	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> 	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> 	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> 	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> 	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> 	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> 	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> 	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> 	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> 	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> 	at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
> 	... 18 more
> Caused by: java.lang.NullPointerException
> 	at java.io.Reader.&lt;init&gt;(Reader.java:78)
> 	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
> 	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:108)
> 	at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
> 	... 20 more
> </u></p><p><b>description</b> <u>The server encountered an internal error (org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
> 	at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> 	at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
> 	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> 	at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> 	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> 	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> 	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> 	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> 	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> 	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> 	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> 	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> 	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> 	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> 	at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> 	at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> 	at java.lang.Thread.run(Thread.java:636)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.txt.TXTParser@a51027
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
> 	at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
> 	... 18 more
> Caused by: java.lang.NullPointerException
> 	at java.io.Reader.&lt;init&gt;(Reader.java:78)
> 	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:93)
> 	at java.io.BufferedReader.&lt;init&gt;(BufferedReader.java:108)
> 	at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:59)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
> 	... 20 more

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira