You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sonam mittal <so...@gmail.com> on 2019/01/23 10:47:52 UTC

Solr indexing raises error while posting PDF

I am using Solr-6.6.4 version and Ubuntu 16 version.I have created a
collection in Solr using the configuration files of the Solr example
*techproducts*. I am trying to post a PDF in Solr but it is raising some
errors.I have also installed the apache tika through maven but still it is
showing the following error.

SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/ifarm_tech/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file Types.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #500 (Server Error)
for url: http://localhost:8983/solr/ifarm_tech/update/extract?resource.name=%2Fhome%2Fubuntu%2Fpdf_cancer%2FTypes.pdf&literal.id=%2Fhome%2Fubuntu%2Fpdf_cancer%2FTypes.pdf
    SimplePostTool: WARNING: Response: <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
    <title>Error 500 Server Error</title>
    </head>
    <body><h2>HTTP ERROR 500</h2>
    <p>Problem accessing /solr/ifarm_tech/update/extract. Reason:
    <pre>    Server Error</pre></p><h3>Caused
by:</h3><pre>java.lang.NoClassDefFoundError: Could not initialize
class org.apache.pdfbox.pdmodel.PDDocument
            at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:149)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
            at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
            at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
            at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
            at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
            at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
            at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
            at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
            at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
            at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
            at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
            at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
            at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
            at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
            at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
            at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
            at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
            at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
            at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
            at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
            at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
            at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
            at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
            at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
            at org.eclipse.jetty.server.Server.handle(Server.java:534)
            at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
            at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
            at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
            at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
            at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
            at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
            at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
            at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
            at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
            at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
            at java.lang.Thread.run(Thread.java:748)
    </pre>

    </body>
    </html>
    SimplePostTool: WARNING: IOException while reading response:
java.io.IOException: Server returned HTTP response code: 500 for URL:
http://localhost:8983/solr/ifarm_tech/update/extract?resource.name=%2Fhome%2Fubuntu%2Fpdf_cancer%2FTypes.pdf&literal.id=%2Fhome%2Fubuntu%2Fpdf_cancer%2FTypes.pdf
    1 files indexed.
    COMMITting Solr index changes to
http://localhost:8983/solr/ifarm_tech/update...
    Time spent: 0:00:00.354