You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Kashish <it...@gmail.com> on 2014/04/25 20:49:07 UTC

Not allowing exact match with WordDelimiterFilterFactory

Hi,

I am having some problem with WordDelimiterFilterFactory. This is my
fieldType

 <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
     </fieldType>

So now, If i search for a word like fast-five across this field, the debug
shows me 

<str name="rawquerystring">(((titleName:fast-five)^20 OR
(akaName:fast-five)^10))</str>
<str name="querystring">(((titleName:fast-five)^20 OR
(akaName:fast-five)^10))</str>
<str name="parsedquery">(+(((((titleName:fast-five titleName:fast)/no_coord)
((titleName:five titleName:fastfive)/no_coord))^20.0) ((((akaName:fast-five
akaName:fast)/no_coord) ((akaName:five
akaName:fastfive)/no_coord))^10.0)))/no_coord</str>
<str name="parsedquery_toString">+((((titleName:fast-five titleName:fast)
(titleName:five titleName:fastfive))^20.0) (((akaName:fast-five
akaName:fast) (akaName:five akaName:fastfive))^10.0))</str>

which is perfect. Now when i search for a word with double quotes like
"fast-five" , i expect it to return only those titles that just have this
word in it exactly as such. But due to this analyzer am not able to do so. 
I tried separating the index time and query time analyzers and remove the
WDF from it at query time, But then whatever i search, it always searches
for the term 'fast-five' as a whole only.
Please suggest what can be the solution.





-----
Thanks,
Kashish
--
View this message in context: http://lucene.472066.n3.nabble.com/Not-allowing-exact-match-with-WordDelimiterFilterFactory-tp4133193.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not allowing exact match with WordDelimiterFilterFactory

Posted by Kashish <it...@gmail.com>.

Hi Jack,

The autoGeneratePhraseQueries="true" for the text field type will make it
take as a phrase every time right. I want it to take it as a phrase only
when given within double quotes and otherwise not(i.e if the input is
fast-five , search for fast,five,fast five,etc). For now i separated the
analyzers as you mentioned and removed all but one from query time.

 <fieldType name="text_general_Title" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="false">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
		<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0"
preserveOriginal="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
		<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="0"/>		
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

i get the query passed as the following if i give the search strimg as
"fast-five" within quotes.

<str name="rawquerystring">(((titleName:"fast-five")^20 OR
(akaName:"fast-five")^10))</str>
<str name="querystring">(((titleName:"fast-five")^20 OR
(akaName:"fast-five")^10))</str>
<str name="parsedquery">(+(PhraseQuery(titleName:"fast five"^20.0)
PhraseQuery(akaName:"fast five"^10.0)))/no_coord</str>
<str name="parsedquery_toString">+(titleName:"fast five"^20.0 akaName:"fast
five"^10.0)</str>

Why si the hyphen getting removed? I have no clue.




-----
Thanks,
Kashish
--
View this message in context: http://lucene.472066.n3.nabble.com/Not-allowing-exact-match-with-WordDelimiterFilterFactory-tp4133193p4133235.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Not allowing exact match with WordDelimiterFilterFactory

Posted by Jack Krupansky <ja...@basetechnology.com>.

Generally, if you are using the word delimiter filter, you need to have 
separate index and query analyzers and set preserverOriginal="true" for the 
index analyzer, but set preserveOriginal="false" for the query analyzer.

Also, set autoGeneratePhraseQueries="true" for your text field type.

-- Jack Krupansky

-----Original Message----- 
From: Kashish
Sent: Friday, April 25, 2014 2:49 PM
To: solr-user@lucene.apache.org
Subject: Not allowing exact match with WordDelimiterFilterFactory

Hi,

I am having some problem with WordDelimiterFilterFactory. This is my
fieldType

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
     </fieldType>

So now, If i search for a word like fast-five across this field, the debug
shows me

<str name="rawquerystring">(((titleName:fast-five)^20 OR
(akaName:fast-five)^10))</str>
<str name="querystring">(((titleName:fast-five)^20 OR
(akaName:fast-five)^10))</str>
<str name="parsedquery">(+(((((titleName:fast-five titleName:fast)/no_coord)
((titleName:five titleName:fastfive)/no_coord))^20.0) ((((akaName:fast-five
akaName:fast)/no_coord) ((akaName:five
akaName:fastfive)/no_coord))^10.0)))/no_coord</str>
<str name="parsedquery_toString">+((((titleName:fast-five titleName:fast)
(titleName:five titleName:fastfive))^20.0) (((akaName:fast-five
akaName:fast) (akaName:five akaName:fastfive))^10.0))</str>

which is perfect. Now when i search for a word with double quotes like
"fast-five" , i expect it to return only those titles that just have this
word in it exactly as such. But due to this analyzer am not able to do so.
I tried separating the index time and query time analyzers and remove the
WDF from it at query time, But then whatever i search, it always searches
for the term 'fast-five' as a whole only.
Please suggest what can be the solution.





-----
Thanks,
Kashish
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Not-allowing-exact-match-with-WordDelimiterFilterFactory-tp4133193.html
Sent from the Solr - User mailing list archive at Nabble.com.