You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by MBD <mb...@gmail.com> on 2011/10/20 23:54:33 UTC

org.apache.pdfbox.pdmodel.PDPage Error

Hi, I'm new to Solr and trying to get it to index PDFs. Having trouble getting started. Following examples in ExtractingRequestHandler wiki <http://wiki.apache.org/solr/ExtractingRequestHandler>. 

Got Solr running and it indexes html, xml & txt files just fine...but when I try to feed it a .pdf it barfs back a "Error 500 Could not initialize class org.apache.pdfbox.pdmodel.PDPage" error:

  $ curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@index.pdf"
  <html>
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
  <title>Error 500 Could not initialize class org.apache.pdfbox.pdmodel.PDPage

  java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.PDPage
  ...

I thought maybe it's because Tika isn't installed/included so I tried downloading and installing Tika separately...but even the Tika install fails with: 

  -------------------------------------------------------------------------------
  Test set: org.apache.tika.parser.pdf.PDFParserTest
  -------------------------------------------------------------------------------
  Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 0.63 sec <<< FAILURE!
  testVarious(org.apache.tika.parser.pdf.PDFParserTest)  Time elapsed: 0.165 sec  <<< ERROR!
  java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.PDPage

I don't know Java (but hopefully won't need to in order to get basic indexing up and running as ultimate goal is to query this via Sunspot from a Rails app) so go easy on me. 

Let me know if you want/need more of the error dump.

Any help would be greatly appreciated!
-Mike

Re: org.apache.pdfbox.pdmodel.PDPage Error

Posted by Mike Sokolov <so...@ifactory.com>.
On 10/24/2011 02:35 PM, MBD wrote:
> Is this really a stumper? This is my first experience with Solr and having spent only an hour or so with it I hit this barrier (below). I'm sure *I* am doing something completely wrong just hoping someone more familiar with the platform can help me identify&  fix it.
>
> For starters...what's "Could not initialize class ..." mean in Java exactly? Maybe that the class (ie code) itself doesn't exist? - so perhaps I haven't downloaded all the pieces of the project? Or, could it be a hint that my kit is just not configured correctly? Sorry, I'm not a Java expert...but would like to get this stabilized...if possible.
>
>    
Yeah - that's the problem. looks like the pdfbox jar is not installed in 
a place where Solr can find it (on its classpath).
> If this is the wrong mailing list then just tell me and I'll go away...
>
> Thanks!
>
> On Oct 20, 2011, at 2:54 PM, MBD wrote:
>
>    

Re: org.apache.pdfbox.pdmodel.PDPage Error

Posted by MBD <mb...@gmail.com>.
Is this really a stumper? This is my first experience with Solr and having spent only an hour or so with it I hit this barrier (below). I'm sure *I* am doing something completely wrong just hoping someone more familiar with the platform can help me identify & fix it.

For starters...what's "Could not initialize class ..." mean in Java exactly? Maybe that the class (ie code) itself doesn't exist? - so perhaps I haven't downloaded all the pieces of the project? Or, could it be a hint that my kit is just not configured correctly? Sorry, I'm not a Java expert...but would like to get this stabilized...if possible.

If this is the wrong mailing list then just tell me and I'll go away...

Thanks!

On Oct 20, 2011, at 2:54 PM, MBD wrote:

> Hi, I'm new to Solr and trying to get it to index PDFs. Having trouble getting started. Following examples in ExtractingRequestHandler wiki <http://wiki.apache.org/solr/ExtractingRequestHandler>. 
> 
> Got Solr running and it indexes html, xml & txt files just fine...but when I try to feed it a .pdf it spits out an "Error 500 Could not initialize class org.apache.pdfbox.pdmodel.PDPage":
> 
>  $ curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@index.pdf"
>  <html>
>  <head>
>  <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
>  <title>Error 500 Could not initialize class org.apache.pdfbox.pdmodel.PDPage
> 
>  java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.PDPage
>  ...
> 
> I thought maybe it's because Tika isn't installed/included so I tried downloading and installing Tika separately...but even the Tika install fails with: 
> 
>  -------------------------------------------------------------------------------
>  Test set: org.apache.tika.parser.pdf.PDFParserTest
>  -------------------------------------------------------------------------------
>  Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 0.63 sec <<< FAILURE!
>  testVarious(org.apache.tika.parser.pdf.PDFParserTest)  Time elapsed: 0.165 sec  <<< ERROR!
>  java.lang.NoClassDefFoundError: Could not initialize class org.apache.pdfbox.pdmodel.PDPage
> 
> I don't know Java (but hopefully won't need to in order to get basic indexing up and running as ultimate goal is to query this via Sunspot from a Rails app) so go easy on me. 
> 
> Let me know if you want/need more of the error dump.
> 
> Any help would be greatly appreciated!
> -Mike