You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/07/07 20:42:43 UTC
[Solr Wiki] Update of "AnalyzersTokenizersTokenFilters" by ShalinMangar
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
The comment on the change is:
Added note on preserveOriginal and splitOnNumerics
------------------------------------------------------------------------------
* '''splitOnCaseChange="1"''' causes lowercase => uppercase transitions to generate a new part [Solr 1.3]:
* `"PowerShot" => "Power" "Shot"`
* `"TransAM" => "Trans" "AM"`
+ * '''splitOnNumerics="1"''' causes alphabet => number transitions to generate a new part [Solr 1.3]:
+ * `"j2se" => "j" "2" "se"`
Note that this is the default behaviour in all released versions of Solr.
There are also a number of parameters that affect what tokens are present in the final output and if subwords are combined:
@@ -372, +374 @@
* `"500-42" => "50042"`
* '''catenateAll="1"''' causes all subword parts to be catenated:
* `"wi-fi-4000" => "wifi4000"`
+ * '''preserveOriginal="1"''' causes the original token to be indexed without modifications (in addition to the tokens produced due to other options)
These parameters may be combined in any way.
* Example of generateWordParts="1" and catenateWords="1":
@@ -391, +394 @@
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
+ preserveOriginal="1"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>
@@ -404, +408 @@
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
+ preserveOriginal="1"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>