You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by JiaJun Zhu <JZ...@alexanderstreet.com> on 2019/04/03 08:23:24 UTC

Use "CommonGramsFilterFactory" and "StopFilterFactory" in the query analyzer chain breaks phrase queries

Hello,

I followed the steps in LUCENE-7698 and found a query on "hello with an accent" get empty result which should get match on the field "features":"Good unicode support: héllo (hello with an accent over the e)" of document (id: SOLR1000). I'm trying to apply "CommonGramsFilterFactory" and "StopFilterFactory" in the query analyzer for our solr environment, while this issue cause some query get empty result.

The issue can be reproduce by the steps in LUCENE-7698 and just change the query string to "hello with an accent", following is the step:


1.) Download and extract Solr (in my test case version 6.4.1) somewhere.
2.) Modify server/solr/configsets/sample_techproducts_configs/conf/managed-schema and modify text_general fieldType by adding CommonGrams(Query)Filter before stopWordFilter:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.CommonGramsFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.CommonGramsQueryFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

3.) Add "with" to server/solr/configsets/sample_techproducts_configs/conf/stopwords.txt and make sure the file has correct line endings (extracted from Solr zip it seems to contain DOS/Windows lien endings which may break things).

4.) Run the techproducts example with "bin/solr -e techproducts"

5.) Browse to <http://localhost:8983/solr/techproducts/select?q=%22hello%20with%20an%20accent%22&debugQuery=true<http://localhost:8983/solr/techproducts/select?q=%22iPod%20with%20Video%22&debugQuery=true>>

6.) Observe that parsedquery in the debug output is empty




Best regards,

JiaJun
Manager Technology
Alexander Street, a ProQuest Company
No. 201 NingXia Road, Room 6J Shanghai China P.R.
200063