You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by chris3001 <ch...@hotmail.com> on 2012/03/28 03:57:03 UTC
Solr with UIMA
I am having a hard time integrating UIMA with Solr. I have downloaded the
Solr 3.5 dist and have it successfully running with nutch and tika on
windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars
from solr/contrib/uima/lib to the working /lib in solr. Next, I read the
readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml
and schema.xml accordingly to no avail. I then found this link which seemed
a bit more applicable since I didnt care to use Alchemy or OpenCalais:
http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1 Still-
when I run a curl command that imports a pdf via solrcell I do not get the
additional UIMA fields nor do I get anything on my logs. The test.pdf is
parsed though and I see the pdf in Solr using:
curl
'http://localhost:8080/solr/update/extract?fmap.content=content&literal.id=doc1&commit=true'
-F "file=@test.pdf"
What I added to my SolrConfig.XML:
/<updateRequestProcessorChain name="uima">
<processor
class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
</lst>
<str
name="analysisEngine">C:\web\solrcelluimacrawler\com\rondhuit\uima\desc\KeyphraseExtractAnnotatorDescriptor.xml</str>
<bool name="ignoreErrors">true</bool>
<str name="logField">id</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>content</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">com.rondhuit.uima.yahoo.Keyphrase</str>
<lst name="mapping">
<str name="feature">keyphrase</str>
<str name="field">UIMAname</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
/
I also adjusted my requestHander:
/<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>/
Finally, my added entries in my Schema.xml
/
<field name="UIMAname" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>
<dynamicField name="*_sm" type="string" indexed="true" stored="true"/>
/
All I am trying to do is have test *any* UIMA AE in Solr and cannot figure
out what I am doing wrong. Thank you in advance for reading this.
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3863324.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by chris3001 <ch...@hotmail.com>.
Tommaso,
I apologize for my delayed response. Thank you very much for your time
looking into this!!
I will try to replicate your efforts on my end this week.
Respectfully,
Chris
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3898094.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by debdoot <de...@gmail.com>.
Further observation on the error:
All requests to add documents through the /update URL land up with the same
error, irrespective of the fields contained in the document. If I don't use
the UIMAUpdateRequestProcessor, I can add/update documents successfully.
Here are the snippets relevant to updateRequestProcessor declarations in my
solrconfig.xml
<requestHandler name="/update"
class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
<updateRequestProcessorChain name="uima">
<processor
class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
</lst>
<str name="analysisEngine">C:\ex1\RoomNumberAnnotator.xml</str>
<bool name="ignoreErrors">false</bool>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>content</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">org.apache.uima.tutorial.RoomNumber</str>
<lst name="mapping">
<str name="feature">building</str>
<str name="field">UIMAname</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Please help.
Thanks
Debdoot
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987083.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by Tommaso Teofili <to...@gmail.com>.
Hi all,
2012/6/1 Jack Krupansky <ja...@basetechnology.com>
> Is it failing on the first document? I see "uid 5", suggests that it is
> not. If not, how is this document different from the others?
>
> I see the exception
> org.apache.uima.resource.**ResourceInitializationExceptio**n, suggesting
> that some file cannot be loaded.
>
> It sounds like it may be having trouble loading "aePath"
> ("analysisEngine"). Or maybe some other file?
>
thanks Jack, that's correct, it's most likely what's causing the reported
error.
Tommaso
>
> -- Jack Krupansky
>
> -----Original Message----- From: debdoot
> Sent: Thursday, May 31, 2012 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr with UIMA
>
>
> Hi Tommaso,
>
> I have followed the steps you have listed to try to deploy the example
> RoomNumberAnnotator with Solr 3.5.
> Here is the error trace that I get:
>
>
> org.apache.solr.common.**SolrException: processing error: null. uid=5,
> text="Test Room HAW GN-K35..."
> at
> org.apache.solr.uima.**processor.**UIMAUpdateRequestProcessor.**
> processAdd(**UIMAUpdateRequestProcessor.**java:107)
> at
> org.apache.solr.handler.**XMLLoader.processUpdate(**
> XMLLoader.java:158)
> at org.apache.solr.handler.**XMLLoader.load(XMLLoader.**java:79)
> at
> org.apache.solr.handler.**ContentStreamHandlerBase.**
> handleRequestBody(**ContentStreamHandlerBase.java:**58)
> at
> org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
> RequestHandlerBase.java:129&#**41;
> at org.apache.solr.core.SolrCore.**execute(SolrCore.java:**1372)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.execute&#**
> 40;SolrDispatchFilter.java:**356)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.doFilter&#**
> 40;SolrDispatchFilter.java:**252)
> at
> com.ibm.ws.webcontainer.**filter.FilterInstanceWrapper.**doFilter(**
> FilterInstanceWrapper.java:**192)
> at
> com.ibm.ws.webcontainer.**filter.WebAppFilterChain.**doFilter(**
> WebAppFilterChain.java:89)
> at
> com.ibm.ws.webcontainer.**filter.WebAppFilterManager.**doFilter(**
> WebAppFilterManager.java:919&#**41;
> at
> com.ibm.ws.webcontainer.**filter.WebAppFilterManager.**invokeFilters(*
> *WebAppFilterManager.java:1016&**#41;
> at
> com.ibm.ws.webcontainer.**webapp.WebApp.handleRequest&#**
> 40;WebApp.java:3703)
> at
> com.ibm.ws.webcontainer.**webapp.WebGroup.handleRequest&**
> #40;WebGroup.java:304)
> at
> com.ibm.ws.webcontainer.**WebContainer.handleRequest&#**
> 40;WebContainer.java:953)
> at
> com.ibm.ws.webcontainer.**WSWebContainer.handleRequest&#**
> 40;WSWebContainer.java:1655&#**41;
> at
> com.ibm.ws.webcontainer.**channel.WCChannelLink.ready&#**
> 40;WCChannelLink.java:195)
> at
> com.ibm.ws.http.channel.**inbound.impl.HttpInboundLink.**
> handleDiscrimination(**HttpInboundLink.java:452)
> at
> com.ibm.ws.http.channel.**inbound.impl.HttpInboundLink.**
> handleNewRequest(**HttpInboundLink.java:511)
> at
> com.ibm.ws.http.channel.**inbound.impl.HttpInboundLink.**
> processRequest(**HttpInboundLink.java:305)
> at
> com.ibm.ws.http.channel.**inbound.impl.HttpInboundLink.**
> ready(HttpInboundLink.**java:276)
> at
> com.ibm.ws.tcp.channel.impl.**NewConnectionInitialReadCallba**
> ck.sendToDiscriminators(**NewConnectionInitialReadCallba**
> ck.java:214)
> at
> com.ibm.ws.tcp.channel.impl.**NewConnectionInitialReadCallba**
> ck.complete(**NewConnectionInitialReadCallba**ck.java:113)
> at
> com.ibm.ws.tcp.channel.impl.**AioReadCompletionListener.**
> futureCompleted(**AioReadCompletionListener.**java:165)
> at
> com.ibm.io.async.**AbstractAsyncFuture.**invokeCallback(**
> AbstractAsyncFuture.java:217&#**41;
> at
> com.ibm.io.async.**AsyncChannelFuture.**fireCompletionActions(**
> AsyncChannelFuture.java:161&#**41;
> at com.ibm.io.async.AsyncFuture.**completed(AsyncFuture.**
> java:138)
> at com.ibm.io.async.**ResultHandler.complete(**
> ResultHandler.java:204)
> at
> com.ibm.io.async.**ResultHandler.**runEventProcessingLoop(**
> ResultHandler.java:775)
> at com.ibm.io.async.**ResultHandler$2.run(**
> ResultHandler.java:905)
> at com.ibm.ws.util.ThreadPool$**Worker.run(ThreadPool.**java:1650)
> Caused by: org.apache.uima.resource.**ResourceInitializationExceptio**n
> at
> org.apache.solr.uima.**processor.ae.**OverridingParamsAEProvider.**
> getAE(**OverridingParamsAEProvider.**java:86)
> at
> org.apache.solr.uima.**processor.**UIMAUpdateRequestProcessor.**
> processText(**UIMAUpdateRequestProcessor.**java:144)
> at
> org.apache.solr.uima.**processor.**UIMAUpdateRequestProcessor.**
> processAdd(**UIMAUpdateRequestProcessor.**java:77)
> ... 30 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.uima.util.**XMLInputSource.<init>&#**
> 40;XMLInputSource.java:118&#**41;
> at
> org.apache.solr.uima.**processor.ae.**OverridingParamsAEProvider.**
> getAE(**OverridingParamsAEProvider.**java:58)
> ... 32 more
>
> at
> com.ibm.ws.webcontainer.**webapp.**WebAppDispatcherContext.**sendError(**
> WebAppDispatcherContext.java:**624)
> at
> com.ibm.ws.webcontainer.**webapp.**WebAppDispatcherContext.**sendError(**
> WebAppDispatcherContext.java:**642)
> at
> com.ibm.ws.webcontainer.srt.**SRTServletResponse.sendError(**
> SRTServletResponse.java:1235)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.sendError(**
> SolrDispatchFilter.java:380)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.**writeResponse(**
> SolrDispatchFilter.java:326)
> at
> org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
> SolrDispatchFilter.java:265)
> ....
> ....
>
> Please let me know if you have any insights on what could be the issue.
>
> Thanks in advance,
> Debdoot
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/Solr-with-UIMA-**tp3863324p3987056.html<http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987056.html>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Solr with UIMA
Posted by Jack Krupansky <ja...@basetechnology.com>.
Is it failing on the first document? I see "uid 5", suggests that it is not.
If not, how is this document different from the others?
I see the exception
org.apache.uima.resource.ResourceInitializationException, suggesting that
some file cannot be loaded.
It sounds like it may be having trouble loading "aePath" ("analysisEngine").
Or maybe some other file?
-- Jack Krupansky
-----Original Message-----
From: debdoot
Sent: Thursday, May 31, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr with UIMA
Hi Tommaso,
I have followed the steps you have listed to try to deploy the example
RoomNumberAnnotator with Solr 3.5.
Here is the error trace that I get:
org.apache.solr.common.SolrException: processing error: null. uid=5,
text="Test Room HAW GN-K35..."
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:107)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:192)
at
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:89)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:919)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1016)
at
com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:3703)
at
com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest(WebGroup.java:304)
at
com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:953)
at
com.ibm.ws.webcontainer.WSWebContainer.handleRequest(WSWebContainer.java:1655)
at
com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:195)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:452)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:511)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:305)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:276)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)
at
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)
at
com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
at
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:138)
at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:204)
at
com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:775)
at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1650)
Caused by: org.apache.uima.resource.ResourceInitializationException
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:86)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:144)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:77)
... 30 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:118)
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:58)
... 32 more
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:624)
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:642)
at
com.ibm.ws.webcontainer.srt.SRTServletResponse.sendError(SRTServletResponse.java:1235)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
....
....
Please let me know if you have any insights on what could be the issue.
Thanks in advance,
Debdoot
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987056.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by debdoot <de...@gmail.com>.
Hi Tommaso,
I have followed the steps you have listed to try to deploy the example
RoomNumberAnnotator with Solr 3.5.
Here is the error trace that I get:
org.apache.solr.common.SolrException: processing error: null. uid=5,
text="Test Room HAW GN-K35..."
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:107)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:158)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:192)
at
com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:89)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:919)
at
com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1016)
at
com.ibm.ws.webcontainer.webapp.WebApp.handleRequest(WebApp.java:3703)
at
com.ibm.ws.webcontainer.webapp.WebGroup.handleRequest(WebGroup.java:304)
at
com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:953)
at
com.ibm.ws.webcontainer.WSWebContainer.handleRequest(WSWebContainer.java:1655)
at
com.ibm.ws.webcontainer.channel.WCChannelLink.ready(WCChannelLink.java:195)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:452)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.handleNewRequest(HttpInboundLink.java:511)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.processRequest(HttpInboundLink.java:305)
at
com.ibm.ws.http.channel.inbound.impl.HttpInboundLink.ready(HttpInboundLink.java:276)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.sendToDiscriminators(NewConnectionInitialReadCallback.java:214)
at
com.ibm.ws.tcp.channel.impl.NewConnectionInitialReadCallback.complete(NewConnectionInitialReadCallback.java:113)
at
com.ibm.ws.tcp.channel.impl.AioReadCompletionListener.futureCompleted(AioReadCompletionListener.java:165)
at
com.ibm.io.async.AbstractAsyncFuture.invokeCallback(AbstractAsyncFuture.java:217)
at
com.ibm.io.async.AsyncChannelFuture.fireCompletionActions(AsyncChannelFuture.java:161)
at com.ibm.io.async.AsyncFuture.completed(AsyncFuture.java:138)
at com.ibm.io.async.ResultHandler.complete(ResultHandler.java:204)
at
com.ibm.io.async.ResultHandler.runEventProcessingLoop(ResultHandler.java:775)
at com.ibm.io.async.ResultHandler$2.run(ResultHandler.java:905)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1650)
Caused by: org.apache.uima.resource.ResourceInitializationException
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:86)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:144)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:77)
... 30 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:118)
at
org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:58)
... 32 more
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:624)
at
com.ibm.ws.webcontainer.webapp.WebAppDispatcherContext.sendError(WebAppDispatcherContext.java:642)
at
com.ibm.ws.webcontainer.srt.SRTServletResponse.sendError(SRTServletResponse.java:1235)
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:380)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:326)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
....
....
Please let me know if you have any insights on what could be the issue.
Thanks in advance,
Debdoot
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3987056.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by Tommaso Teofili <to...@gmail.com>.
Hi again Chris,
I finally manage to find some proper time to test your configuration.
First thing to notice is that it worked for me assuming the following
pre-requisites were satisfied:
- you had the jar containing the AnalysisEngine for the RoomAnnotator.xml
in your libraries section (this is actually the uimaj-examples.jar which is
shipped with the UIMA SDK under libs[1]) :
- you had the solr-uima jar in your libraries
the above are done adding the following lines to the solrconfig (usually on
the top of the file just beneath the <luceneMatchVersion> element)
<lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />
<lib dir="../../contrib/uima/lib" regex=".*\.jar" />
<lib dir="/path/to/apache-uima/lib" />
If you want to know what's going wrong I'd advice to not ignore errors
within the UIMAUpdateProcessor configuration:
<bool name="ignoreErrors">false</bool>
What I get if I run your same curl command and then make a *:* query is :
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
<lst name="params">
<str name="wt">xml</str>
<str name="start">0</str>
<str name="q">*:*</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="id">4</str>
<str name="content">Test Room HAW GN-K35</str>
<arr name="UIMAname">
<str>Hawthorne</str>
</arr>
</doc>
</result>
</response>
which look ok to me.
Hope this helps.
Tommaso
[1] : http://mirror.switch.ch/mirror/apache/dist//uima///uimaj-2.3.1-bin.zip
2012/3/28 chris3001 <ch...@hotmail.com>
> Tommaso,
> Thank you so much for looking into this, I am very grateful!
>
> Chris
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3865291.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Solr with UIMA
Posted by chris3001 <ch...@hotmail.com>.
Tommaso,
Thank you so much for looking into this, I am very grateful!
Chris
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3865291.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by Tommaso Teofili <to...@gmail.com>.
Hi Chris,
I did never tried the Nutch integration so I can't help with that.
However I'll try to repeat your same setup and will let you know what it
comes out for me.
Tommaso
2012/3/28 chris3001 <ch...@hotmail.com>
> Still not getting there on Solr with UIMA...
> Has anyone taken example 1 (RoomAnnotator) and successfully tested this by
> any chance?
>
> Thanks to Tommaso my curl statement has changed to /update:
>
> curl http://localhost:8080/solr/update?commit=true -H "Content-Type:
> text/xml" --data-binary '<add><doc><field name="id">4</field><field
> name="content">Test Room HAW GN-K35</field></doc></add>'
>
> Next- my solrconfig has these two parts:
> Part1:
> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
> <lst name="defaults">
> <str name="update.processor">uima</str>
> </lst>
> </requestHandler>
>
> Part2:
> <updateRequestProcessorChain name="uima">
> <processor
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
> <lst name="uimaConfig">
> <lst name="runtimeParameters">
> </lst>
> <str
>
> name="analysisEngine">C:\uima\examples\descriptors\tutorial\ex1\RoomNumberAnnotator.xml</str>
> <bool name="ignoreErrors">true</bool>
> <str name="logField">id</str>
> <lst name="analyzeFields">
> <bool name="merge">false</bool>
> <arr name="fields">
> <str>content</str>
> </arr>
> </lst>
> <lst name="fieldMappings">
> <lst name="type">
> <str name="name">org.apache.uima.tutorial.RoomNumber</str>
> <lst name="mapping">
> <str name="feature">building</str>
> <str name="field">UIMAname</str>
> </lst>
> </lst>
> </lst>
> </lst>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
> Finally, my schema.xml:
>
> <field name="UIMAname" type="string" indexed="true" stored="true"
> multiValued="true" required="false"/>
>
> When I run this example AE XML Descriptor in the Document Analyzer I see
> the
> token GN-K35 highlighted. However, when I try integrating into Solr using
> above settings and search for *:* in: http://localhost:8080/solr/admin/ I
> do
> not see the UIMAname tag at all. Nor with any data (namely, GN-K35 in this
> example).
>
> Thank you for your time in reading this.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3864810.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Solr with UIMA
Posted by chris3001 <ch...@hotmail.com>.
Still not getting there on Solr with UIMA...
Has anyone taken example 1 (RoomAnnotator) and successfully tested this by
any chance?
Thanks to Tommaso my curl statement has changed to /update:
curl http://localhost:8080/solr/update?commit=true -H "Content-Type:
text/xml" --data-binary '<add><doc><field name="id">4</field><field
name="content">Test Room HAW GN-K35</field></doc></add>'
Next- my solrconfig has these two parts:
Part1:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
Part2:
<updateRequestProcessorChain name="uima">
<processor
class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
</lst>
<str
name="analysisEngine">C:\uima\examples\descriptors\tutorial\ex1\RoomNumberAnnotator.xml</str>
<bool name="ignoreErrors">true</bool>
<str name="logField">id</str>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>content</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">org.apache.uima.tutorial.RoomNumber</str>
<lst name="mapping">
<str name="feature">building</str>
<str name="field">UIMAname</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Finally, my schema.xml:
<field name="UIMAname" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>
When I run this example AE XML Descriptor in the Document Analyzer I see the
token GN-K35 highlighted. However, when I try integrating into Solr using
above settings and search for *:* in: http://localhost:8080/solr/admin/ I do
not see the UIMAname tag at all. Nor with any data (namely, GN-K35 in this
example).
Thank you for your time in reading this.
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3864810.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by chris3001 <ch...@hotmail.com>.
Tommaso-
Thank you so much for your reply and pointing this out! I will look into it.
However, when I run nutch I still dont see the new fields:
$ bin/nutch crawl urls -solr http://localhost:8080/solr/ -depth 1 -topN 2
Does that still have to do with the update/extract call?
Thanks again for your time.
Chris
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3864418.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by Tommaso Teofili <to...@gmail.com>.
Hi Chris,
2012/3/28 chris3001 <ch...@hotmail.com>
> I am having a hard time integrating UIMA with Solr. I have downloaded the
> Solr 3.5 dist and have it successfully running with nutch and tika on
> windows 7 using solrcell and curl via cygwin. To begin, I copied the 6 jars
> from solr/contrib/uima/lib to the working /lib in solr. Next, I read the
> readme.txt file in solr/contrib/uima/lib and edited both my solrconfig.xml
> and schema.xml accordingly to no avail. I then found this link which seemed
> a bit more applicable since I didnt care to use Alchemy or OpenCalais:
> http://code.google.com/a/apache-extras.org/p/rondhuit-uima/?redir=1 Still-
> when I run a curl command that imports a pdf via solrcell I do not get the
> additional UIMA fields nor do I get anything on my logs. The test.pdf is
> parsed though and I see the pdf in Solr using:
> curl
> '
> http://localhost:8080/solr/update/extract?fmap.content=content&literal.id=doc1&commit=true
> '
> -F "file=@test.pdf"
>
> What I added to my SolrConfig.XML:
>
> /<updateRequestProcessorChain name="uima">
> <processor
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
> <lst name="uimaConfig">
> <lst name="runtimeParameters">
> </lst>
> <str
>
> name="analysisEngine">C:\web\solrcelluimacrawler\com\rondhuit\uima\desc\KeyphraseExtractAnnotatorDescriptor.xml</str>
> <bool name="ignoreErrors">true</bool>
> <str name="logField">id</str>
> <lst name="analyzeFields">
> <bool name="merge">false</bool>
> <arr name="fields">
> <str>content</str>
> </arr>
> </lst>
> <lst name="fieldMappings">
> <lst name="type">
> <str name="name">com.rondhuit.uima.yahoo.Keyphrase</str>
> <lst name="mapping">
> <str name="feature">keyphrase</str>
> <str name="field">UIMAname</str>
> </lst>
> </lst>
> </lst>
> </lst>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
> /
> I also adjusted my requestHander:
>
> /<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
> <lst name="defaults">
> <str name="update.processor">uima</str>
> </lst>
> </requestHandler>/
>
> Finally, my added entries in my Schema.xml
>
> /
> <field name="UIMAname" type="string" indexed="true" stored="true"
> multiValued="true" required="false"/>
> <dynamicField name="*_sm" type="string" indexed="true" stored="true"/>
> /
>
> All I am trying to do is have test *any* UIMA AE in Solr and cannot figure
> out what I am doing wrong. Thank you in advance for reading this.
>
>
if I understood things correctly the problem is that you're using the
/update/extract call which uses the SolrCell ExtractingRequestHandler while
the UIMA update processor chain is available via the /update path, see:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>/
HTH
Tommaso
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3863324.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Re: Solr with UIMA
Posted by dsy99 <ds...@rediffmail.com>.
Hi Rahul,
Thank you for the reply. I tried by modifying the
updateRequestProcessorChain as follows:
<updateRequestProcessorChain name="uima" default="true">
But still I am not able to see the UIMA fields in the result. I executed
the following curl command to index a file named "test.docx"
curl
"http://localhost:8983/solr/update/extract?fmap.content=content&literal.id=doc47&commit=true"
-F "file=@test.docx"
When I searched the same document with
"http://localhost:8983/solr/select?q=id:doc47" command, got the following
result.
<result name="response" numFound="1" start="0">
<doc>
<str name="author">divakar</str>
<arr name="content_type">
<str>
application/vnd.openxmlformats-officedocument.wordprocessingml.document
</str>
</arr>
<str name="id">doc47</str>
<date name="last_modified">2012-04-18T14:19:00Z</date>
</doc>
</result>
Could you please help where I am wrong?
With Thaks & Regds:
Divakar
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3925670.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by introfini <in...@gmail.com>.
Rahul Warawdekar wrote
>
> Hi Divakar,
>
> Try making your updateRequestProcessorChain as default. Simply add
> default="true" as follows and check if that works.
>
> <updateRequestProcessorChain name="uima" *default="true"*>
>
>
Rahul,
This fixed my problem, you saved my week!
I was following the README.txt instructions and they didn't work, after
adding the default="true" it immediately start working.
Maybe that should go into the README.txt?
Thank you.
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p4001014.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr with UIMA
Posted by Rahul Warawdekar <ra...@gmail.com>.
Hi Divakar,
Try making your updateRequestProcessorChain as default. Simply add
default="true" as follows and check if that works.
<updateRequestProcessorChain name="uima" *default="true"*>
On Thu, Apr 19, 2012 at 12:01 PM, dsy99 <ds...@rediffmail.com> wrote:
> Hi Chris,
> Are you been able to get success to integrate the UIMA in SOLR.
>
> I too tried to integrate Uima in Solr by following the instructions
> provided in README i.e. the following four steps:
>
> Step1. I set <lib/> tags in solrconfig.xml appropriately to point the jar
> files.
>
> <lib dir="../../contrib/uima/lib" />
> <lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />
>
> Step2. modified my "schema.xml" adding the fields I wanted to hold
> metadata
> specifying proper values for type, indexed, stored and multiValued options
> as follows:
>
> <field name="language" type="string" indexed="true" stored="true"
> required="false"/>
> <field name="concept" type="string" indexed="true" stored="true"
> multiValued="true" required="false"/>
> <field name="sentence" type="text" indexed="true" stored="true"
> multiValued="true" required="false" />
>
> Step3. modified my solrconfig.xml adding the following snippet:
>
> <updateRequestProcessorChain name="uima">
> <processor
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
> <lst name="uimaConfig">
> <lst name="runtimeParameters">
> <str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
> <str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
> <str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
> <str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
> <str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
> <str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
> </lst>
> <str
>
> name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
>
> <bool name="ignoreErrors">true</bool>
>
> <lst name="analyzeFields">
> <bool name="merge">false</bool>
> <arr name="fields">
> <str>text</str>
> </arr>
> </lst>
> <lst name="fieldMappings">
> <lst name="type">
> <str
> name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
> <lst name="mapping">
> <str name="feature">text</str>
> <str name="field">concept</str>
> </lst>
> </lst>
> <lst name="type">
> <str
> name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
> <lst name="mapping">
> <str name="feature">language</str>
> <str name="field">language</str>
> </lst>
> </lst>
> <lst name="type">
> <str name="name">org.apache.uima.SentenceAnnotation</str>
> <lst name="mapping">
> <str name="feature">coveredText</str>
> <str name="field">sentence</str>
> </lst>
> </lst>
> </lst>
> </lst>
> </processor>
> <processor class="solr.LogUpdateProcessorFactory" />
> <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
> Step 4: and finally created a new UpdateRequestHandler with the following:
> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
> <lst name="defaults">
> <str name="update.processor">uima</str>
> </lst>
>
>
> Further I indexed a word file called text.docx using the following
> command:
>
> curl
> "
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true
> "
> -F "myfile=@UIMA_sample_test.docx"
>
> When I searched the file I am not able to see the additional UIMA fields.
>
> Can you please help if you been able to solve the problem.
>
>
> With Regds & Thanks
> Divakar
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3923443.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
--
Thanks and Regards
Rahul A. Warawdekar
Re: Solr with UIMA
Posted by dsy99 <ds...@rediffmail.com>.
Hi Chris,
Are you been able to get success to integrate the UIMA in SOLR.
I too tried to integrate Uima in Solr by following the instructions
provided in README i.e. the following four steps:
Step1. I set <lib/> tags in solrconfig.xml appropriately to point the jar
files.
<lib dir="../../contrib/uima/lib" />
<lib dir="../../dist/" regex="apache-solr-uima-\d.*\.jar" />
Step2. modified my "schema.xml" adding the fields I wanted to hold metadata
specifying proper values for type, indexed, stored and multiValued options
as follows:
<field name="language" type="string" indexed="true" stored="true"
required="false"/>
<field name="concept" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>
<field name="sentence" type="text" indexed="true" stored="true"
multiValued="true" required="false" />
Step3. modified my solrconfig.xml adding the following snippet:
<updateRequestProcessorChain name="uima">
<processor
class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
</lst>
<str
name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<bool name="ignoreErrors">true</bool>
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str
name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
<lst name="mapping">
<str name="feature">text</str>
<str name="field">concept</str>
</lst>
</lst>
<lst name="type">
<str
name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
<lst name="mapping">
<str name="feature">language</str>
<str name="field">language</str>
</lst>
</lst>
<lst name="type">
<str name="name">org.apache.uima.SentenceAnnotation</str>
<lst name="mapping">
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Step 4: and finally created a new UpdateRequestHandler with the following:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
Further I indexed a word file called text.docx using the following command:
curl
"http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true"
-F "myfile=@UIMA_sample_test.docx"
When I searched the file I am not able to see the additional UIMA fields.
Can you please help if you been able to solve the problem.
With Regds & Thanks
Divakar
--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-with-UIMA-tp3863324p3923443.html
Sent from the Solr - User mailing list archive at Nabble.com.