You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Weber <to...@rtl.lu> on 2006/07/12 10:16:03 UTC

docFreq disable / disable end of word letter removal

Hello,

   for my specific project, I would like to ask if the following  
settings can be made on the solr system:

   - Currently, I see that the docFreq is also playing in the  
scoring. Is is possible to disable this feature so that this is not  
calculated in the score ?
   - I see that solr is stripping some characters at the end of the  
search words. This is okay, but i try, for example, a search on  
"comed", and he is searching for "come". Can I select when the system  
will remove which letters and when ? Or where can I disable this  
system ? The removal of the trailing "s" is great, but for some  
circumstances, the "d" removal of "comed" is not the ideal way.

   Thanks for the help

   And thanks for the really great tool solr is

   Tom

Re: docFreq disable / disable end of word letter removal

Posted by Chris Hostetter <ho...@fucit.org>.
:    - Currently, I see that the docFreq is also playing in the
: scoring. Is is possible to disable this feature so that this is not
: calculated in the score ?

this is a fairly core aspect of the Lucene scoring calculation, but it can
be changed with a small bit of java coding.  If you write your own
subclass of "Similarity" you can override the "idf" function to return a
constanct value regardless of the docFreq.  You can then specify your new
Similarity class by name in your schema.xml and Solr will use it instead
of the default...

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html

:    - I see that solr is stripping some characters at the end of the
: search words. This is okay, but i try, for example, a search on
: "comed", and he is searching for "come". Can I select when the system
: will remove which letters and when ? Or where can I disable this
: system ? The removal of the trailing "s" is great, but for some
: circumstances, the "d" removal of "comed" is not the ideal way.

This is all determined by the Analyzer used for each field (or more
generally: field type) ... this is also configured via the schema.xml.  As
with SImilarity, you can write your own java subclass to use if you want
extremely customized behavior, or you can use any of the Analyzers that
come with lucene (by name) or you can build up an Analyzer in your
schema.xml using solr TokenizerFactories and TokenFilterFactories.
Docs on all of the Solr Factories can be found in the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html



-Hoss