You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jörg Agatz <jo...@googlemail.com> on 2011/01/14 12:07:49 UTC

Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

Hallo,

I will indexig fulltext Documents, so i read, that Tika is a god idea :-)

so i try the How to from lucidimagination (
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika
)


first of all, i install Maven2, and mvn Tika, i have test Tika in shell
command and get results from Programm.
like: java -jar tika-app-0.8.jar -t test.pdf

Than i download the example from lucid, mvn clean install and wailt,

i doo exactly that waht the How to say, an i test it 3 times..

but when i start Solr, i get an error.

I dont know what i have to do, to indexing dokuments?
what happans with solr an tika?


__________________________
__________________________

HTTP ERROR: 500

Severe errors in solr configuration.

Check your log files for more detailed information on what may be wrong.

If you want solr to continue after configuration errors, change:

 <abortOnConfigurationError>false</abortOnConfigurationError>

in solrconfig.xml

-------------------------------------------------------------
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'
	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
	at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
	at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
	at org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
	at org.apache.solr.core.SolrCore.<init>(SolrCore.java:556)
	at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
	at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
	at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
	at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
	at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
	at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
	at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
	at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
	at org.mortbay.jetty.Server.doStart(Server.java:210)
	at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
	at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.mortbay.start.Main.invokeMain(Main.java:183)
	at org.mortbay.start.Main.start(Main.java:497)
	at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:247)
	at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
	... 30 more

RequestURI=/solr/

*Powered by Jetty:// <http://jetty.mortbay.org/>*

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

Posted by Jörg Agatz <jo...@googlemail.com>.
no, i dont know

that is the request Hadler:

<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler"
startup="lazy">
<lst name="defaults">
<str name="ext.map.Last-Modified">last_modified</str>
<bool name="ext.ignore.und.fl">true</bool>
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>

and i start it like this:
curl "
http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text<http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true%5C&ext.def.fl=text>"
-F "myfile=@test.txt"

with this: "ext.idx.attr=true\<http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true%5C&ext.def.fl=text>"
i think its ok? but nothing

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

Posted by Stefan Matheis <ma...@googlemail.com>.
pass an value for your id-field as you do it already for all the other
fields?

http://search.lucidimagination.com/search/document/ca95d06e700322ed/missing_required_field_id_using_extractingrequesthandler

On Fri, Jan 14, 2011 at 12:59 PM, Jörg Agatz <jo...@googlemail.com>wrote:

> ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i
> have a Oher Problem, i cant sent content to the Server..
>
>
>
>
> when i will send Content to solr i get:
>
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
> <title>Error 400 </title>
> </head>
> <body><h2>HTTP ERROR: 400</h2><pre>Document [null] missing required field:
> id</pre>
> <p>RequestURI=/solr/update/extract</p><p><i><small><a href="
> http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
>
> </body>
> </html>
>
>
> I do:
> curl "
>
> http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text
> "
> -F "myfile=@test.txt"
>
> some ideas?
>

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

Posted by Lance Norskog <go...@gmail.com>.
You need to add another parameter which defines the 'id' field. 'id'
is required- it is unique for every document.  Usually you can pick
the filename.

Lance

On Fri, Jan 14, 2011 at 3:59 AM, Jörg Agatz <jo...@googlemail.com> wrote:
> ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i
> have a Oher Problem, i cant sent content to the Server..
>
>
>
>
> when i will send Content to solr i get:
>
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
> <title>Error 400 </title>
> </head>
> <body><h2>HTTP ERROR: 400</h2><pre>Document [null] missing required field:
> id</pre>
> <p>RequestURI=/solr/update/extract</p><p><i><small><a href="
> http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
>
> </body>
> </html>
>
>
> I do:
> curl "
> http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text"
> -F "myfile=@test.txt"
>
> some ideas?
>



-- 
Lance Norskog
goksron@gmail.com

Re: Problem with Tika and ExtractingRequestHandler (How to from lucidimagination)

Posted by Jörg Agatz <jo...@googlemail.com>.
ok, now in the 4 test, it works ? ok.. i dont know... it works.. but now i
have a Oher Problem, i cant sent content to the Server..




when i will send Content to solr i get:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 400 </title>
</head>
<body><h2>HTTP ERROR: 400</h2><pre>Document [null] missing required field:
id</pre>
<p>RequestURI=/solr/update/extract</p><p><i><small><a href="
http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

</body>
</html>


I do:
curl "
http://192.168.105.66:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text"
-F "myfile=@test.txt"

some ideas?