You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by vempap <ph...@emc.com> on 2012/11/02 21:10:56 UTC

Solr-UIMA integration : analyzing multi-fields

Hello all,

  how to analyze multiple fields using UIMA when we add the UIMA update
chain to the update handler ? and how to map which field gets analyzed to
which field.

For instance,

lets say there are two text fields, text1 & text2 for which I need to
generate pos-tags using UIMA. In the fields section I can definitely do this
:

<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>text1</str>
<str>text2</str>
</arr>
</lst>

and in the fieldMappings :

<lst name="type">
                    <str name="name">org.apache.uima.TokenAnnotation</str>
                    <lst name="mapping">
                        <str name="feature">posTag</str>
                        <str name="field">postags1</str>
                    </lst>
                </lst>

but how to specify that I need pos-tags for field text2 too and that too in
postags2 field. If there is any schema/DTD for these configuration settings
- please let me know.

Also, how can I change the code or is there a way to specify to generate
pos-tags after getting the token stream from an analyzer. Currently, the
update processor gets the text from the input field and generates pos-tags
into postags1 field using WhitespaceTokenizer defined in the xml
configuration files by default. how can I change the tokenizer such that it
uses a Solr Analyzer/ Tokenizer ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-analyzing-multi-fields-tp4017890.html
Sent from the Solr - User mailing list archive at Nabble.com.