You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2011/03/09 09:49:09 UTC
[Solr Wiki] Update of "SolrUIMA" by TommasoTeofili

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrUIMA" page has been changed by TommasoTeofili.
http://wiki.apache.org/solr/SolrUIMA?action=diff&rev1=8&rev2=9

--------------------------------------------------

  If the attribute merge is false the field specified will be analyzed separately while if merge is true the listed fields contents will be merged and analyzed only once.
  
  
+ see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]]
  
- see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]]
+ ==== UIMA components used ====
+ UIMA supports the use of existing analysis engines (see [[http://uima.apache.org/sandbox.html|here]] and [[http://uima.apache.org/external-resources.html|here]]) as long as the creation of custom components. 
+ 
+ The current contrib/uima module uses a predefined set of components :
+  1. [[http://uima.apache.org/sandbox.html#whitespace.tokenizer|WhitespaceTokenizer]]
+  2. [[http://uima.apache.org/sandbox.html#tagger.annotator|HMMTagger]]
+  3. [[http://uima.apache.org/sandbox.html#opencalais.annotator|OpenCalaisAnnotator]]
+  4. [[http://uima.apache.org/sandbox.html#alchemy.annotator|AlchemyAPIAnnotator]]
+ 
+ These components are arranged in a pipeline inside the [[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml|OverridingParamsExtServicesAE]] Analysis Engine descriptor. As you can see looking at the descriptor fragment;
+ {{{
+         <node>AggregateSentenceAE</node>
+         <node>OpenCalaisAnnotator</node>
+         <node>TextKeywordExtractionAEDescriptor</node>
+         <node>TextLanguageDetectionAEDescriptor</node>
+         <node>TextCategorizationAEDescriptor</node>
+         <node>TextConceptTaggingAEDescriptor</node>
+         <node>TextRankedEntityExtractionAEDescriptor</node>
+ }}}
+ the first node represent an aggregate Analysis Engine which includes the Whitespace Tokenizer and HMM Tagger (recognizing sentences), the second node uses the Open Calais Annotator to extracte named entities, the following nodes use different Alchemy API Annotator services to detect keywords, language, document category, discovered concepts and named entities.
+ 
+ ===== Using other UIMA components =====
+ To use different UIMA components inside the contrib/uima module you need to:
+  1. import the component jar
+  2. change the descriptor inside solrconfig/uimaConfig/analysisEngine element
+  3. optionally adjust Analysis Engine configuration
+  3. change the types and features' mapping inside solrconfig/uimaConfig/fieldMapping
+ 
+ ====== Import the component jar ======
+ If you're using Ant you only need put the component jar inside the solr/contrib/uima/lib directory.
+ 
+ If you're using Maven you need to declare the component you want to use inside the <dependencies> element in the generated pom.xml
+ 
+ ====== Change the descriptor ======
+ 
+ ====== Adjust AE configuration (optional) ======
+ 
+ ====== Change the types and features' mapping ======
  
  
  == Solrcas ==