You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Monique Monteiro <mo...@gmail.com> on 2018/08/03 18:19:12 UTC
Problem with fuzzy search and accentuation
Hi all,
I'm having a problem when I search for a word with some non-ASCII
characters in combination with fuzzy search.
For example, if I type 'administração' or 'contratação' (both words end
with 'ção'), the search results are returned correctly. However, if I type
'administração~', no result is returned. For other terms, I haven't found
any problem.
My Solr version is 6.6.3.
Has anyone any idea about what may cause this issue?
Thanks in advance.
--
Monique Monteiro
Twitter: http://twitter.com/monilouise
Re: Problem with fuzzy search and accentuation
Posted by Monique Monteiro <mo...@gmail.com>.
Hi Erick,
In fact, stemming was the culprit for the problem.
Thanks!
Monique Monteiro
On Fri, Aug 3, 2018 at 3:45 PM Erick Erickson <er...@gmail.com>
wrote:
> Stemming is getting in the way here. You could probably use copyField
> to a field that doesn't stem and fuzzy search against that field
> rather than the stemmed one.
>
> Best,
> Erick
>
> On Fri, Aug 3, 2018 at 11:31 AM, Monique Monteiro
> <mo...@gmail.com> wrote:
> > By adding debug=true, I get the following:
> >
> >
> > - administração (correct result):
> >
> > "debug":{
> > "rawquerystring":"administração",
> > "querystring":"administração",
> > "parsedquery":"text:administr",
> > "parsedquery_toString":"text:administr",
> > "QParser":"LuceneQParser"}}
> >
> >
> > - administração~ (incorrect behaviour, no results):
> >
> > "debug":{
> > "rawquerystring":"administração~",
> > "querystring":"administração~",
> > "parsedquery":"text:administração~2",
> > "parsedquery_toString":"text:administração~2",
> > "QParser":"LuceneQParser"}}
> >
> >
> > - tribunal (correct result):
> >
> > "debug":{
> > "rawquerystring":"tribunal",
> > "querystring":"tribunal",
> > "parsedquery":"text:tribunal",
> > "parsedquery_toString":"text:tribunal",
> > "QParser":"LuceneQParser"}}
> >
> >
> > - tribubal (correct result, no accents):
> >
> > "debug":{
> > "rawquerystring":"tribubal~",
> > "querystring":"tribubal~",
> > "parsedquery":"text:tribubal~2",
> > "parsedquery_toString":"text:tribubal~2",
> > "QParser":"LuceneQParser"}}
> >
> > On Fri, Aug 3, 2018 at 3:26 PM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> What does adding &debug=query show you the parsed query is in the two
> >> cases?
> >>
> >> My guess is that accent folding is kicking in one case but not the
> >> other, but that's
> >> a blind guess.
> >>
> >>
> >>
> >> On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
> >> <mo...@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I'm having a problem when I search for a word with some non-ASCII
> >> > characters in combination with fuzzy search.
> >> >
> >> > For example, if I type 'administração' or 'contratação' (both words
> end
> >> > with 'ção'), the search results are returned correctly. However, if I
> >> type
> >> > 'administração~', no result is returned. For other terms, I haven't
> >> found
> >> > any problem.
> >> >
> >> > My Solr version is 6.6.3.
> >> >
> >> > Has anyone any idea about what may cause this issue?
> >> >
> >> > Thanks in advance.
> >> >
> >> > --
> >> > Monique Monteiro
> >> > Twitter: http://twitter.com/monilouise
> >>
> >
> >
> > --
> > Monique Monteiro
> > Twitter: http://twitter.com/monilouise
>
--
Monique Monteiro
Twitter: http://twitter.com/monilouise
Re: Problem with fuzzy search and accentuation
Posted by Erick Erickson <er...@gmail.com>.
Stemming is getting in the way here. You could probably use copyField
to a field that doesn't stem and fuzzy search against that field
rather than the stemmed one.
Best,
Erick
On Fri, Aug 3, 2018 at 11:31 AM, Monique Monteiro
<mo...@gmail.com> wrote:
> By adding debug=true, I get the following:
>
>
> - administração (correct result):
>
> "debug":{
> "rawquerystring":"administração",
> "querystring":"administração",
> "parsedquery":"text:administr",
> "parsedquery_toString":"text:administr",
> "QParser":"LuceneQParser"}}
>
>
> - administração~ (incorrect behaviour, no results):
>
> "debug":{
> "rawquerystring":"administração~",
> "querystring":"administração~",
> "parsedquery":"text:administração~2",
> "parsedquery_toString":"text:administração~2",
> "QParser":"LuceneQParser"}}
>
>
> - tribunal (correct result):
>
> "debug":{
> "rawquerystring":"tribunal",
> "querystring":"tribunal",
> "parsedquery":"text:tribunal",
> "parsedquery_toString":"text:tribunal",
> "QParser":"LuceneQParser"}}
>
>
> - tribubal (correct result, no accents):
>
> "debug":{
> "rawquerystring":"tribubal~",
> "querystring":"tribubal~",
> "parsedquery":"text:tribubal~2",
> "parsedquery_toString":"text:tribubal~2",
> "QParser":"LuceneQParser"}}
>
> On Fri, Aug 3, 2018 at 3:26 PM Erick Erickson <er...@gmail.com>
> wrote:
>
>> What does adding &debug=query show you the parsed query is in the two
>> cases?
>>
>> My guess is that accent folding is kicking in one case but not the
>> other, but that's
>> a blind guess.
>>
>>
>>
>> On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
>> <mo...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I'm having a problem when I search for a word with some non-ASCII
>> > characters in combination with fuzzy search.
>> >
>> > For example, if I type 'administração' or 'contratação' (both words end
>> > with 'ção'), the search results are returned correctly. However, if I
>> type
>> > 'administração~', no result is returned. For other terms, I haven't
>> found
>> > any problem.
>> >
>> > My Solr version is 6.6.3.
>> >
>> > Has anyone any idea about what may cause this issue?
>> >
>> > Thanks in advance.
>> >
>> > --
>> > Monique Monteiro
>> > Twitter: http://twitter.com/monilouise
>>
>
>
> --
> Monique Monteiro
> Twitter: http://twitter.com/monilouise
Re: Problem with fuzzy search and accentuation
Posted by Monique Monteiro <mo...@gmail.com>.
By adding debug=true, I get the following:
- administração (correct result):
"debug":{
"rawquerystring":"administração",
"querystring":"administração",
"parsedquery":"text:administr",
"parsedquery_toString":"text:administr",
"QParser":"LuceneQParser"}}
- administração~ (incorrect behaviour, no results):
"debug":{
"rawquerystring":"administração~",
"querystring":"administração~",
"parsedquery":"text:administração~2",
"parsedquery_toString":"text:administração~2",
"QParser":"LuceneQParser"}}
- tribunal (correct result):
"debug":{
"rawquerystring":"tribunal",
"querystring":"tribunal",
"parsedquery":"text:tribunal",
"parsedquery_toString":"text:tribunal",
"QParser":"LuceneQParser"}}
- tribubal (correct result, no accents):
"debug":{
"rawquerystring":"tribubal~",
"querystring":"tribubal~",
"parsedquery":"text:tribubal~2",
"parsedquery_toString":"text:tribubal~2",
"QParser":"LuceneQParser"}}
On Fri, Aug 3, 2018 at 3:26 PM Erick Erickson <er...@gmail.com>
wrote:
> What does adding &debug=query show you the parsed query is in the two
> cases?
>
> My guess is that accent folding is kicking in one case but not the
> other, but that's
> a blind guess.
>
>
>
> On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
> <mo...@gmail.com> wrote:
> > Hi all,
> >
> > I'm having a problem when I search for a word with some non-ASCII
> > characters in combination with fuzzy search.
> >
> > For example, if I type 'administração' or 'contratação' (both words end
> > with 'ção'), the search results are returned correctly. However, if I
> type
> > 'administração~', no result is returned. For other terms, I haven't
> found
> > any problem.
> >
> > My Solr version is 6.6.3.
> >
> > Has anyone any idea about what may cause this issue?
> >
> > Thanks in advance.
> >
> > --
> > Monique Monteiro
> > Twitter: http://twitter.com/monilouise
>
--
Monique Monteiro
Twitter: http://twitter.com/monilouise
Re: Problem with fuzzy search and accentuation
Posted by Erick Erickson <er...@gmail.com>.
What does adding &debug=query show you the parsed query is in the two cases?
My guess is that accent folding is kicking in one case but not the
other, but that's
a blind guess.
On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
<mo...@gmail.com> wrote:
> Hi all,
>
> I'm having a problem when I search for a word with some non-ASCII
> characters in combination with fuzzy search.
>
> For example, if I type 'administração' or 'contratação' (both words end
> with 'ção'), the search results are returned correctly. However, if I type
> 'administração~', no result is returned. For other terms, I haven't found
> any problem.
>
> My Solr version is 6.6.3.
>
> Has anyone any idea about what may cause this issue?
>
> Thanks in advance.
>
> --
> Monique Monteiro
> Twitter: http://twitter.com/monilouise