You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jörg Agatz <jo...@googlemail.com> on 2011/01/14 12:07:49 UTC
Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)
Hallo,
I will indexig fulltext Documents, so i read, that Tika is a god idea :-)
so i try the How to from lucidimagination (
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika
)
first of all, i install Maven2, and mvn Tika, i have test Tika in shell
command and get results from Programm.
like: java -jar tika-app-0.8.jar -t test.pdf
Than i download the example from lucid, mvn clean install and wailt,
i doo exactly that waht the How to say, an i test it 3 times..
but when i start Solr, i get an error.
I dont know what i have to do, to indexing dokuments?
what happans with solr an tika?
__________________________
__________________________
HTTP ERROR: 500
Severe errors in solr configuration.
Check your log files for more detailed information on what may be wrong.
If you want solr to continue after configuration errors, change:
<abortOnConfigurationError>false</abortOnConfigurationError>
in solrconfig.xml
-------------------------------------------------------------
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:556)
at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
... 30 more
RequestURI=/solr/
*Powered by Jetty:// <http://jetty.mortbay.org/>*
Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)
Posted by Jörg Agatz <jo...@googlemail.com>.
no, i dont know
that is the request Hadler:
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler"
startup="lazy">
<lst name="defaults">
<str name="ext.map.Last-Modified">last_modified</str>
<bool name="ext.ignore.und.fl">true</bool>
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
and i start it like this:
curl "
http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text<http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true%5C&ext.def.fl=text>"
-F "myfile=@test.txt"
with this: "ext.idx.attr=true\<http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true%5C&ext.def.fl=text>"
i think its ok? but nothing
Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)
Posted by Stefan Matheis <ma...@googlemail.com>.
pass an value for your id-field as you do it already for all the other
fields?
http://search.lucidimagination.com/search/document/ca95d06e700322ed/missing_required_field_id_using_extractingrequesthandler
On Fri, Jan 14, 2011 at 12:59 PM, Jörg Agatz <jo...@googlemail.com>wrote:
> ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i
> have a Oher Problem, i cant sent content to the Server..
>
>
>
>
> when i will send Content to solr i get:
>
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
> <title>Error 400 </title>
> </head>
> <body><h2>HTTP ERROR: 400</h2><pre>Document [null] missing required field:
> id</pre>
> <p>RequestURI=/solr/update/extract</p><p><i><small><a href="
> http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
>
> </body>
> </html>
>
>
> I do:
> curl "
>
> http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text
> "
> -F "myfile=@test.txt"
>
> some ideas?
>
Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)
Posted by Lance Norskog <go...@gmail.com>.
You need to add another parameter which defines the 'id' field. 'id'
is required- it is unique for every document. Usually you can pick
the filename.
Lance
On Fri, Jan 14, 2011 at 3:59 AM, Jörg Agatz <jo...@googlemail.com> wrote:
> ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i
> have a Oher Problem, i cant sent content to the Server..
>
>
>
>
> when i will send Content to solr i get:
>
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
> <title>Error 400 </title>
> </head>
> <body><h2>HTTP ERROR: 400</h2><pre>Document [null] missing required field:
> id</pre>
> <p>RequestURI=/solr/update/extract</p><p><i><small><a href="
> http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
>
> </body>
> </html>
>
>
> I do:
> curl "
> http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text"
> -F "myfile=@test.txt"
>
> some ideas?
>
--
Lance Norskog
goksron@gmail.com
Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)
Posted by Jörg Agatz <jo...@googlemail.com>.
ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i
have a Oher Problem, i cant sent content to the Server..
when i will send Content to solr i get:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 400 </title>
</head>
<body><h2>HTTP ERROR: 400</h2><pre>Document [null] missing required field:
id</pre>
<p>RequestURI=/solr/update/extract</p><p><i><small><a href="
http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
</body>
</html>
I do:
curl "
http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text"
-F "myfile=@test.txt"
some ideas?