You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2017/05/21 15:40:10 UTC

[jira] [Updated] (TIKA-1953) tika-server NullPointerException while processing rtfs

     [ https://issues.apache.org/jira/browse/TIKA-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-1953:
------------------------------------
    Fix Version/s:     (was: 1.15)
                   1.16

> tika-server NullPointerException while processing rtfs
> ------------------------------------------------------
>
>                 Key: TIKA-1953
>                 URL: https://issues.apache.org/jira/browse/TIKA-1953
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.12
>         Environment: Python 2.7.11 :: Anaconda 4.0.0 (64-bit)
> Red Hat Enterprise Linux Server release 6.7 (Santiago)
> java version "1.7.0_95"
> OpenJDK Runtime Environment (rhel-2.6.4.0.el6_7-x86_64 u95-b00)
> OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)
>            Reporter: Ravi
>            Assignee: Tim Allison
>              Labels: newbie, rtf, tika-python, tika-server, xmlContent,
>             Fix For: 1.16
>
>         Attachments: officeinstallations3.rtf
>
>
> Looks like the xmlContent=True flag causes tika.py: Warn: Tika server returned status: 422 error
> I start the tika server and then run the following code in the python kernel at bash
> import tika
> from tika import parser
> parsed = parser.from_file('/path/to/file.rtf,'http://localhost:9003',xm
> lContent=True)
> I get.. tika.py: Warn: Tika server returned status: 422
> Looking at the tika-server log I get the following dump:
> Note: The parser seems to work fine without the xmlContent=True flag set. I get the right output but setting this flag creates the NullPointerException below
> ------------------------------------------------------------------------------
> Apr 15, 2016 2:36:55 PM org.apache.tika.server.resource.TikaResource logRequest
> INFO: rmeta/xml (autodetecting type)
> Apr 15, 2016 2:36:55 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: rmeta/xml: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@21f0dbb9
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>         at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
>         at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:281)
>         at org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:138)
>         at org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:119)
>         at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:181)
>         at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:97)
>         at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:200)
>         at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:99)
>         at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
>         at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
>         at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>         at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
>         at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
>         at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
>         at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>         at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>         at org.eclipse.jetty.server.Server.handle(Server.java:370)
>         at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
>         at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:982)
>         at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1043)
>         at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:865)
>         at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>         at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
>         at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:696)
>         at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:53)
>         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>         at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>         at org.apache.tika.sax.ToXMLContentHandler$ElementInfo.access$000(ToXMLContentHandler.java:38)
>         at org.apache.tika.sax.ToXMLContentHandler.endElement(ToXMLContentHandler.java:195)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHandler.java:256)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandlerDecorator.java:136)
>         at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.java:273)
>         at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:226)
>         at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:478)
>         at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:439)
>         at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:87)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         ... 38 more
> ------------------------------------------------------------------------------



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)