You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2013/02/20 23:00:43 UTC
[Solr Wiki] Update of "TextProfileSignature" by EustacheFelenc
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "TextProfileSignature" page has been changed by EustacheFelenc:
http://wiki.apache.org/solr/TextProfileSignature?action=diff&rev1=4&rev2=5
TextProfileSignature operates on raw text, without the filtering provided by Analyzers, and hence will fail to ignore HTML, normalize for diacritics, word stem/semantics, or incorporate the relative importance of different tokens, etc. It also considers only the bag of words, ignoring any word order.
+ == Configuration ==
+
+ === solrconfig.xml ===
+
+ Example settings:
+ {{{
+ <!-- An example dedup update processor that creates the "id" field on the fly
+ based on the hash code of some other fields. This example has overwriteDupes
+ set to false since we are using the id field as the signatureField and Solr
+ will maintain uniqueness based on that anyway. -->
+ <updateRequestProcessorChain name="dedupe">
+ <processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
+ <bool name="enabled">true</bool>
+ <bool name="overwriteDupes">false</bool>
+ <str name="signatureField">id</str>
+ <str name="fields">name,features,cat</str>
+ <str name="signatureClass">org.apache.solr.update.processor.TextProfileSignature</str>
+ <str name="quantRate">.2</str>
+ </processor>
+ <processor class="solr.LogUpdateProcessorFactory" />
+ <processor class="solr.RunUpdateProcessorFactory" />
+ </updateRequestProcessorChain>
+ }}}
+