You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "damian.pawski" <dp...@gmail.com> on 2018/05/31 15:04:24 UTC

Solr 7, exact phrase search, empty results for some records

Hi, 

I have updated Solr from 5.4.1 to 7.2.1.

I have updated the settings accordingly, but in some cases when I am
searching for an exact phrase surrounded by quotes I am getting 0 results.

In 5.4.1 I have 
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
	
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
       
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

In 7.2.1 I have 
 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
	    <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
	    
         <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" /> 
        
		<filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>        
		
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
       
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
		<filter class="solr.FlattenGraphFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
		<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
       
        <filter class="solr.EnglishMinimalStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

I couldn't find any pattern explaining, why for some records searches with
quotes work fine but for the others, 0 results are returned (I have checked
and the records that are missing are imported, as I can find 
them by the Id).

Could you point me to correct direction in terms how can I investigate this?

I have checked the results of the "..analysis..." pages on both instances of
Solr for the problematic records and in both cases I am getting the same
outcome.

Thank you
Damian




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr 7, exact phrase search, empty results for some records

Posted by "damian.pawski" <dp...@gmail.com>.
Thank you for a quick response, 

I have moved the 

/<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/> /

from /<analyzer type="index">/  to /<analyzer type="query">/ section and it
is working fine.

Once again

Thank you
Damian



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr 7, exact phrase search, empty results for some records

Posted by Erick Erickson <er...@gmail.com>.
The analysis page has one major thing to be aware of: It sees what
would be in the field _after_ query parsing. I applaud your use of it,
it's where lots of problems are found ;).

Try adding &debug=query in the two cases. Particularly look at the
parsedquery_tostring in the response and compare.

And I don't _think_ this is the issue since you're specifying phrases,
but split-on-whitespace default has changed, see:
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/

Good luck,
Erick

On Thu, May 31, 2018 at 8:04 AM, damian.pawski <dp...@gmail.com> wrote:
> Hi,
>
> I have updated Solr from 5.4.1 to 7.2.1.
>
> I have updated the settings accordingly, but in some cases when I am
> searching for an exact phrase surrounded by quotes I am getting 0 results.
>
> In 5.4.1 I have
>  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> In 7.2.1 I have
>  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>             <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>
>                 <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/>
>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>                 <filter class="solr.FlattenGraphFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>         <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>
>         <filter class="solr.EnglishMinimalStemFilterFactory"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> I couldn't find any pattern explaining, why for some records searches with
> quotes work fine but for the others, 0 results are returned (I have checked
> and the records that are missing are imported, as I can find
> them by the Id).
>
> Could you point me to correct direction in terms how can I investigate this?
>
> I have checked the results of the "..analysis..." pages on both instances of
> Solr for the problematic records and in both cases I am getting the same
> outcome.
>
> Thank you
> Damian
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html