You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tim AtLee <ti...@gmail.com> on 2010/07/25 08:21:35 UTC
Tika, Solr running under Tomcat 6 on Debian
Hello
I desperately hope someone can help me here... I'm a bit out of my league
here.
I am trying to implement content extraction using Tika and Solr as part of a
search package for a product I am using. I have been successful in getting
Solr to work so far as indexing text, and returning search results, however
I am hitting a wall when I try to use Tika for content extraction.
I add the following configuration to solrconfig.xml:
<requestHandler name="/extract/tika"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
</lst>
<!-- This path only extracts - never updates -->
<lst name="invariants">
<bool name="extractOnly">true</bool>
</lst>
</requestHandler>
During a test, I receive the following error:
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
The full text of this error is listed below.
So, as I indicated in the subject line, I am using Debian linux Squeeze
(testing). Tomcat is at version 6.0.26 and is installed by apt.
Solr is also installed from apt, and is at version:
1.4.0.2010.04.24.07.20.22.
Java -version looks like this:
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
The JDK is also at the same version, and also from apt.
I have built Tika from source (nightly build) using mvn2, and placed
the complied jar's in /lib. /lib is located at /var/solr/site/lib, along
with /var/solr/site/conf and /var/solr/site/data. Hopefully this is the
right place to put the jar's.
I also tried building solr from source (also the nightly build), and was
able to get solr sort of working (not Tika). I could run a single instance,
but getting multiple instances running didn't seem to be in the cards. I
didn't pursue this any further. If this is the route I should go down, if
anyone can direct me on how to install a built Solr war and configure it so
I can use multiple instances, I'll gladly try it out.
I found a similar issue to mine at
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200911.mbox/<D2...@ndhamrexm22.amer.pfizer.com><http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200911.mbox/%3CD2B0462D72664840B72118CB4437CBD403E2A718@ndhamrexm22.amer.pfizer.com%3E>,
>From that email, I tried copying the built Solr jars into the Solr site's
lib directory, then realized that the likelihood of that working was pretty
slim - jars built from a nightly build trying to work with a .war from 1.4.0
was probably not going work. As you might have guessed, it didn't. This is
when I tried building Solr from source (thinking that if all the Solr stuff
was at the same revision, it might work).
I have not tried all of this under Jetty. It's my understanding that Jetty
won't let me do multiple instances, and since this is a requirement for what
I'm doing, I'm more or less constrained to Tomcat.
I have also seen some other references to using OpenJDK instead of Sun JDK.
This resulted in the same error (don't recall the site where I saw this
referenced).
Any help would be greatly appreciated. I am new to Tomcat and Solr, so I
may have some dumb follow-up questions that will be googled thoroughly
first. Sorry in advance..
Tim
--
-------------------------------------------------------------
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:414)
at
org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:450)
at
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:557)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at
org.apache.catalina.core.StandardHost.start(StandardHost.java:785)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at
org.apache.catalina.core.StandardService.start(StandardService.java:519)
at
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:581)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
... 33 more
Re: Tika, Solr running under Tomcat 6 on Debian
Posted by Lance Norskog <go...@gmail.com>.
Ah! Did you set the UTF-8 parameter in Tomcat?
On Mon, Apr 11, 2011 at 2:49 AM, Mike <sa...@gmail.com> wrote:
> Hi Roy,
>
> Thank you for the quick reply. When i tried to index the PDF file i was able
> to see the response:
>
>
> 0
> 479
>
>
>
> Query:
> http://localhost:8080/solr/update/extract?stream.file=D:\mike\lucene\apache-solr-1.4.1\example\exampledocs\Struts%202%20Design%20and%20Programming1.pdf&stream.contentType=application/pdf&literal.id=Struts%202%20Design%20and%20Programming1.pdf&defaultField=text&commit=true
>
> But when i tried to search the content in the pdf i could not get any
> results:
>
>
>
> 0
> 2
> −
>
> on
> 0
> struts
> 10
> 2.2
>
>
>
>
>
> Could you please let me know if I am doing anything wrong. It works fine
> when i tried with default jetty server prior to integrating on the tomcat6.
>
> I have followed installation steps from
> http://wiki.apache.org/solr/SolrTomcat
> (Tomcat on Windows Single Solr app).
>
> Thanks,
> Mike
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805974.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--
Lance Norskog
goksron@gmail.com
Re: Tika, Solr running under Tomcat 6 on Debian
Posted by Mike <sa...@gmail.com>.
Hi Roy,
Thank you for the quick reply. When i tried to index the PDF file i was able
to see the response:
0
479
Query:
http://localhost:8080/solr/update/extract?stream.file=D:\mike\lucene\apache-solr-1.4.1\example\exampledocs\Struts%202%20Design%20and%20Programming1.pdf&stream.contentType=application/pdf&literal.id=Struts%202%20Design%20and%20Programming1.pdf&defaultField=text&commit=true
But when i tried to search the content in the pdf i could not get any
results:
0
2
−
on
0
struts
10
2.2
Could you please let me know if I am doing anything wrong. It works fine
when i tried with default jetty server prior to integrating on the tomcat6.
I have followed installation steps from
http://wiki.apache.org/solr/SolrTomcat
(Tomcat on Windows Single Solr app).
Thanks,
Mike
--
View this message in context: http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805974.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tika, Solr running under Tomcat 6 on Debian
Posted by Roy Liu <li...@gmail.com>.
\apache-solr-3.1.0\contrib\extraction\lib\tika*.jar
--
Best Regards,
Roy Liu
On Mon, Apr 11, 2011 at 3:10 PM, Mike <sa...@gmail.com> wrote:
> Hi All,
>
> I have the same issue. I have installed solr instance on tomcat6. When try
> to index pdf I am running into the below exception:
>
> 11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.NoClassDefFoundError:
> org/apache/tika/exception/TikaException
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
> at
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.tika.exception.TikaException
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> ... 22 more
>
> I could not found any tika jar file.
> Could you please help me out in fixing the above issue.
>
> Thanks,
> Mike
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805615.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Tika, Solr running under Tomcat 6 on Debian
Posted by Mike <sa...@gmail.com>.
Hi All,
I have the same issue. I have installed solr instance on tomcat6. When try
to index pdf I am running into the below exception:
11 Apr, 2011 12:11:55 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoClassDefFoundError:
org/apache/tika/exception/TikaException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at
org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.ClassNotFoundException:
org.apache.tika.exception.TikaException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 22 more
I could not found any tika jar file.
Could you please help me out in fixing the above issue.
Thanks,
Mike
--
View this message in context: http://lucene.472066.n3.nabble.com/Tika-Solr-running-under-Tomcat-6-on-Debian-tp993295p2805615.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tika, Solr running under Tomcat 6 on Debian
Posted by Lance Norskog <go...@gmail.com>.
I would start over from the Solr 1.4.1 binary distribution and follow
the instructions on the wiki:
http://wiki.apache.org/solr/ExtractingRequestHandler
(Java classpath stuff is notoriously difficult, especially when
dynamically configured and loaded. I often cannot tell if Java cannot
load the class it prints, or if that class requires others.)
On Sat, Jul 24, 2010 at 11:21 PM, Tim AtLee <ti...@gmail.com> wrote:
> Hello
>
> I desperately hope someone can help me here... I'm a bit out of my league
> here.
>
> I am trying to implement content extraction using Tika and Solr as part of a
> search package for a product I am using. I have been successful in getting
> Solr to work so far as indexing text, and returning search results, however
> I am hitting a wall when I try to use Tika for content extraction.
>
> I add the following configuration to solrconfig.xml:
> <requestHandler name="/extract/tika"
> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>
> <lst name="defaults">
> </lst>
> <!-- This path only extracts - never updates -->
> <lst name="invariants">
> <bool name="extractOnly">true</bool>
> </lst>
> </requestHandler>
>
> During a test, I receive the following error:
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.extraction.ExtractingRequestHandler'
>
> The full text of this error is listed below.
>
> So, as I indicated in the subject line, I am using Debian linux Squeeze
> (testing). Tomcat is at version 6.0.26 and is installed by apt.
>
> Solr is also installed from apt, and is at version:
> 1.4.0.2010.04.24.07.20.22.
>
> Java -version looks like this:
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>
> The JDK is also at the same version, and also from apt.
>
> I have built Tika from source (nightly build) using mvn2, and placed
> the complied jar's in /lib. /lib is located at /var/solr/site/lib, along
> with /var/solr/site/conf and /var/solr/site/data. Hopefully this is the
> right place to put the jar's.
>
> I also tried building solr from source (also the nightly build), and was
> able to get solr sort of working (not Tika). I could run a single instance,
> but getting multiple instances running didn't seem to be in the cards. I
> didn't pursue this any further. If this is the route I should go down, if
> anyone can direct me on how to install a built Solr war and configure it so
> I can use multiple instances, I'll gladly try it out.
>
> I found a similar issue to mine at
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200911.mbox/<D2...@ndhamrexm22.amer.pfizer.com><http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200911.mbox/%3CD2B0462D72664840B72118CB4437CBD403E2A718@ndhamrexm22.amer.pfizer.com%3E>,
> From that email, I tried copying the built Solr jars into the Solr site's
> lib directory, then realized that the likelihood of that working was pretty
> slim - jars built from a nightly build trying to work with a .war from 1.4.0
> was probably not going work. As you might have guessed, it didn't. This is
> when I tried building Solr from source (thinking that if all the Solr stuff
> was at the same revision, it might work).
>
> I have not tried all of this under Jetty. It's my understanding that Jetty
> won't let me do multiple instances, and since this is a requirement for what
> I'm doing, I'm more or less constrained to Tomcat.
>
> I have also seen some other references to using OpenJDK instead of Sun JDK.
> This resulted in the same error (don't recall the site where I saw this
> referenced).
>
> Any help would be greatly appreciated. I am new to Tomcat and Solr, so I
> may have some dumb follow-up questions that will be googled thoroughly
> first. Sorry in advance..
>
> Tim
>
> --
>
> -------------------------------------------------------------
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.extraction.ExtractingRequestHandler'
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:414)
> at
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:450)
> at
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
> at org.apache.solr.core.SolrCore.<init>(SolrCore.java:557)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
> at
> org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838)
> at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4488)
> at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
> at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
> at
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546)
> at
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
> at
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
> at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
> at
> org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
> at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
> at
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
> at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
> at
> org.apache.catalina.core.StandardHost.start(StandardHost.java:785)
> at
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
> at
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
> at
> org.apache.catalina.core.StandardService.start(StandardService.java:519)
> at
> org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
> at org.apache.catalina.startup.Catalina.start(Catalina.java:581)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
> at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.handler.extraction.ExtractingRequestHandler
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
> ... 33 more
>
--
Lance Norskog
goksron@gmail.com