You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Razvan Serban <ra...@Termene.ro.INVALID> on 2020/11/17 11:25:15 UTC

SOLR fuzzy search not behaving as expected when analysers are used

Hello everyone,

I am using the fuzzy search capability of SOLR 8.7 and I dug into a specific case in which the search misbehaves.

I am using this analyzer (JSON here) on the field that I am using for search

        "analyzer" : {
            "filters":[
                {
                    "class":"solr.ASCIIFoldingFilterFactory",
                    "preserveOriginal":"false"
                },
                {
                    "class":"solr.LowerCaseFilterFactory"
                },
                {
                    "class":"solr.PatternReplaceCharFilterFactory",
                    "replacement":"",
                    "pattern":"[^A-Za-z0-9]"
                }
            ],
            "tokenizer": {
                "class":"solr.KeywordTokenizerFactory"
            }
        }

If the field has the value let's say

abcdefghi

It matches with

a.b.c.d.e.f.g.i

Because those dots inside are discarded due to the PatternReplaceCharFilterFactory.

The problem I have is if instead of normal search I use the fuzzy search. The search term would look like this (with tilde 2 at the end, I am using distance of 2):

a.b.c.d.e.f.g.i~2

This query never matches the original value without dots.

Why is that? I anticipated that the filters are not applied when there is a fuzzy search query running, but the lowercase and the ASCIIFolding ones are working as intended.