You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "Steve Rowe (Confluence)" <co...@apache.org> on 2013/10/01 21:37:00 UTC
[CONF] Apache Solr Reference Guide > UIMA Integration
Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: UIMA Integration (https://cwiki.apache.org/confluence/display/solr/UIMA+Integration)
Change Comment:
---------------------------------------------------------------------
apache-solr-* -> solr-*
Edited by Steve Rowe:
---------------------------------------------------------------------
You can integrate the Apache Unstructured Information Management Architecture ([UIMA|https://uima.apache.org/]) with Solr. UIMA lets you define custom pipelines of Analysis Engines that incrementally add metadata to your documents as annotations.
For more information about Solr UIMA integration, see [https://wiki.apache.org/solr/SolrUIMA].
h2. Configuring UIMA
The SolrUIMA UpdateRequestProcessor is a custom update request processor that takes documents being indexed, sends them to a UIMA pipeline, and then returns the documents enriched with the specified metadata. To configure UIMA for Solr, follow these steps:
# Copy {{solr-uima-4.x.y.jar}} (under {{/solr-4.x.y/dist/}}) and its libraries (under {{contrib/uima/lib}}) to a Solr libraries directory, or set {{<lib/>}} tags in {{solrconfig.xml}} appropriately to point to those jar files: \\
\\
{code:xml|borderStyle=solid|borderColor=#666666}
<lib dir="../../contrib/uima/lib" />
<lib dir="../../dist/" regex="solr-uima-\d.*\.jar" />
{code}
# Modify {{schema.xml}}, adding your desired metadata fields specifying proper values for type, indexed, stored, and multiValued options. For example: \\
\\
{code:xml|borderStyle=solid|borderColor=#666666}
<field name="language" type="string" indexed="true" stored="true" required="false"/>
<field name="concept" type="string" indexed="true" stored="true" multiValued="true" required="false"/>
<field name="sentence" type="text" indexed="true" stored="true" multiValued="true" required="false" />
{code} \\
# Add the following snippet to {{solrconfig.xml}}: \\
\\
{code:xml|borderStyle=solid|borderColor=#666666}
<updateRequestProcessorChain name="uima">
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
</lst>
<str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<!-- Set to true if you want to continue indexing even if text processing fails.
Default is false. That is, Solr throws RuntimeException and
never indexed documents entirely in your session. -->
<bool name="ignoreErrors">true</bool>
<!-- This is optional. It is used for logging when text processing fails.
If logField is not specified, uniqueKey will be used as logField.
<str name="logField">id</str>
-->
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
<lst name="mapping">
<str name="feature">text</str>
<str name="field">concept</str>
</lst>
</lst>
<lst name="type">
<str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
<lst name="mapping">
<str name="feature">language</str>
<str name="field">language</str>
</lst>
</lst>
<lst name="type">
<str name="name">org.apache.uima.SentenceAnnotation</str>
<lst name="mapping">
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
{code}
\\
{note}
{{VALID_ALCHEMYAPI_KEY}} is your AlchemyAPI Access Key. You need to register an AlchemyAPI Access key to use AlchemyAPI services: [http://www.alchemyapi.com/api/register.html]. \\
\\
{{VALID_OPENCALAIS_KEY}} is your Calais Service Key. You need to register a Calais Service key to use the Calais services: [http://www.opencalais.com/apikey]. \\
\\
{{analysisEngine}} must contain an AE descriptor inside the specified path in the classpath. \\
\\
{{analyzeFields}} must contain the input fields that need to be analyzed by UIMA. If {{merge=true}} then their content will be merged and analyzed only once. \\
\\
Field mapping describes which features of which types should go in a field.
{note}
\\
# In your {{solrconfig.xml}} replace the existing default UpdateRequestHandler or create a new UpdateRequestHandler: \\
{code:xml|borderStyle=solid|borderColor=#666666}
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
<lst name="defaults">
<str name="update.processor">uima</str>
</lst>
</requestHandler>
{code}
Once you are done with the configuration your documents will be automatically enriched with the specified fields when you index them.
{scrollbar}
Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action