You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Rob Tulloh (JIRA)" <ji...@apache.org> on 2012/05/29 21:59:24 UTC

[jira] [Commented] (TIKA-934) Tika in server mode stops responding and reports NPE over and over in logs

    [ https://issues.apache.org/jira/browse/TIKA-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285068#comment-13285068 ] 

Rob Tulloh commented on TIKA-934:
---------------------------------

Additional evidence of re-entrancy issues:

2012-05-22_19:10:39.31249 Caused by: java.util.ConcurrentModificationException
2012-05-22_19:10:39.31253       at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
2012-05-22_19:10:39.31257       at java.util.HashMap$KeyIterator.next(HashMap.java:828)
2012-05-22_19:10:39.31262       at java.util.AbstractCollection.toArray(AbstractCollection.java:171)
2012-05-22_19:10:39.31266       at org.apache.tika.metadata.Metadata.names(Metadata.java:171)
2012-05-22_19:10:39.31270       at org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:156)
2012-05-22_19:10:39.31275       at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
2012-05-22_19:10:39.31280       at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:281)
2012-05-22_19:10:39.31285       at org.apache.tika.parser.pdf.PDF2XHTML.startPage(PDF2XHTML.java:128)
2012-05-22_19:10:39.31289       at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:420)
2012-05-22_19:10:39.31293       at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
2012-05-22_19:10:39.31296       at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
2012-05-22_19:10:39.31300       at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:63)
2012-05-22_19:10:39.31304       at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:140)
2012-05-22_19:10:39.31308       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
2012-05-22_19:10:39.31312       ... 4 more

                
> Tika in server mode stops responding and reports NPE over and over in logs
> --------------------------------------------------------------------------
>
>                 Key: TIKA-934
>                 URL: https://issues.apache.org/jira/browse/TIKA-934
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.1
>         Environment: CentOS 5.x
>            Reporter: Rob Tulloh
>            Priority: Critical
>
> We run tika in server mode via:
> /usr/java/jdk/bin/java -Dlog4j.app.name=-server -Djavax.xml.soap.MessageFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl -Dfile.encoding=UTF-8 -Djava.net.preferIPv4Stack=true -server -Xms256M -Xmx768M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/oom/content-extractor-8983.dump.1 -server -Xms500M -Xmx500M -jar /opt/tika/tika-app-1.1.jar --text --encoding=UTF-8 --server 8983
> Our client talks to this over port 8983. We pass data via the socket and get the responses back. However, sometimes, tika will get into a bad state and stop responding. 
> When this happens, we see this in the logs over and over. 
> 2012-05-24_20:12:33.88573 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:33.88576       at org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:33.88580       at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:33.88584       at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:33.88589       at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:33.88593       at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:33.88597       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:33.88602       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:33.88606       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:33.88611       ... 4 more
> 2012-05-24_20:12:49.28441 org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParse
> r@6906daba
> 2012-05-24_20:12:49.28458       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> 2012-05-24_20:12:49.28466       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28477       at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 2012-05-24_20:12:49.28489       at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:130)
> 2012-05-24_20:12:49.28497       at org.apache.tika.cli.TikaCLI$TikaServer$1.run(TikaCLI.java:735)
> 2012-05-24_20:12:49.28509 Caused by: java.lang.NullPointerException
> 2012-05-24_20:12:49.28516       at org.apache.tika.sax.XHTMLContentHandler.lazyEndHead(XHTMLContentHandler.java:157)
> 2012-05-24_20:12:49.28524       at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:237)
> 2012-05-24_20:12:49.28532       at org.apache.tika.sax.XHTMLContentHandler.startElement(XHTMLContentHandler.java:274)
> 2012-05-24_20:12:49.28541       at org.apache.tika.parser.microsoft.WordExtractor.handleParagraph(WordExtractor.java:186)
> 2012-05-24_20:12:49.28550       at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:97)
> 2012-05-24_20:12:49.28558       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:185)
> 2012-05-24_20:12:49.28565       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:160)
> 2012-05-24_20:12:49.28577       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 2012-05-24_20:12:49.28585       ... 4 more
> We have tried to figure out what causes this with no success. We only know that once the server gets into this state, there is no recourse but to restart the tika service.
> Other instances of tika we have running in the test environment continue to work. There is some combination of content or work that causes
> tika to destabilize. Our working theory is that perhaps tika server is not thread safe and that may be causing this behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira