You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by ozatomic <oz...@levelbelow.net> on 2022/04/01 02:26:53 UTC

Search query with uppercase getting different results

Hi,

While doing some testing with my instance of solr I have noticed that if 
i mix cases in a search string it is treated like 2 words? is this the 
expected behavior? or a config issue.


Examples

"myword" returns results with:
- myword
- myWord
- MYWord


"myWORD" returns results with:
- myword
- myWord
- MYWord
- My Word
- my, word


Re: Search query with uppercase getting different results

Posted by Dominique Bejean <do...@eolya.fr>.
Hi,

I suppose, you are using the word delimiter filter with splitOnCaseChange
enabled.
https://solr.apache.org/guide/8_6/filter-descriptions.html#word-delimiter-graph-filter

Disable this option.

Dominique


Le ven. 1 avr. 2022 à 05:28, ozatomic <oz...@levelbelow.net> a écrit :

> Thanks JAG,
>
> Is this something that can configured so that it does not tokenize?
>
> On 1/04/2022 1:30 pm, James Greene wrote:
> > Expected, your search term is getting tokenized to the lowercase to
> > uppercase is similar to a natural language 'word break' thus it gets
> > tokenized to 'my word'.
> >
> > Cheers,
> > JAG
> >
> > On Thu, Mar 31, 2022 at 10:27 PM ozatomic <oz...@levelbelow.net> wrote:
> >
> >> Hi,
> >>
> >> While doing some testing with my instance of solr I have noticed that if
> >> i mix cases in a search string it is treated like 2 words? is this the
> >> expected behavior? or a config issue.
> >>
> >>
> >> Examples
> >>
> >> "myword" returns results with:
> >> - myword
> >> - myWord
> >> - MYWord
> >>
> >>
> >> "myWORD" returns results with:
> >> - myword
> >> - myWord
> >> - MYWord
> >> - My Word
> >> - my, word
> >>
> >>
>

Re: Search query with uppercase getting different results

Posted by Andy C <an...@gmail.com>.
I suspect that the fieldType of the field you are searching against is
configured to use the Word Delimiter Graph Filter
(
https://solr.apache.org/guide/8_11/filter-descriptions.html#word-delimiter-graph-filter)
or perhaps the older variant of this filter, the Word Delimiter Filter.

If that is the case you can change the behavior by modifying the
"splitOnCaseChange" property of the filter. It is possible to have this
filter configured to just be applied at Index time, or Query time, or both.
If you change the Index time configuration you would have to reindex all
your documents for any changes to the index time configuration to take
effect.

- Andy -

On Thu, Mar 31, 2022 at 11:53 PM James Greene <ja...@jamesaustingreene.com>
wrote:

> Yes, you can update the field type from a TextField to a String or change
> the analyzer:
>
> https://solr.apache.org/guide/8_11/analyzers.html
>
> Cheers,
> JAG
>
>
> On Thu, Mar 31, 2022 at 10:41 PM ozatomic <oz...@levelbelow.net> wrote:
>
> > Thanks JAG,
> >
> > Is this something that can configured so that it does not tokenize?
> >
> > On 1/04/2022 1:30 pm, James Greene wrote:
> > > Expected, your search term is getting tokenized to the lowercase to
> > > uppercase is similar to a natural language 'word break' thus it gets
> > > tokenized to 'my word'.
> > >
> > > Cheers,
> > > JAG
> > >
> > > On Thu, Mar 31, 2022 at 10:27 PM ozatomic <oz...@levelbelow.net> wrote:
> > >
> > >> Hi,
> > >>
> > >> While doing some testing with my instance of solr I have noticed that
> if
> > >> i mix cases in a search string it is treated like 2 words? is this the
> > >> expected behavior? or a config issue.
> > >>
> > >>
> > >> Examples
> > >>
> > >> "myword" returns results with:
> > >> - myword
> > >> - myWord
> > >> - MYWord
> > >>
> > >>
> > >> "myWORD" returns results with:
> > >> - myword
> > >> - myWord
> > >> - MYWord
> > >> - My Word
> > >> - my, word
> > >>
> > >>
> >
>

Re: Search query with uppercase getting different results

Posted by James Greene <ja...@jamesaustingreene.com>.
Yes, you can update the field type from a TextField to a String or change
the analyzer:

https://solr.apache.org/guide/8_11/analyzers.html

Cheers,
JAG


On Thu, Mar 31, 2022 at 10:41 PM ozatomic <oz...@levelbelow.net> wrote:

> Thanks JAG,
>
> Is this something that can configured so that it does not tokenize?
>
> On 1/04/2022 1:30 pm, James Greene wrote:
> > Expected, your search term is getting tokenized to the lowercase to
> > uppercase is similar to a natural language 'word break' thus it gets
> > tokenized to 'my word'.
> >
> > Cheers,
> > JAG
> >
> > On Thu, Mar 31, 2022 at 10:27 PM ozatomic <oz...@levelbelow.net> wrote:
> >
> >> Hi,
> >>
> >> While doing some testing with my instance of solr I have noticed that if
> >> i mix cases in a search string it is treated like 2 words? is this the
> >> expected behavior? or a config issue.
> >>
> >>
> >> Examples
> >>
> >> "myword" returns results with:
> >> - myword
> >> - myWord
> >> - MYWord
> >>
> >>
> >> "myWORD" returns results with:
> >> - myword
> >> - myWord
> >> - MYWord
> >> - My Word
> >> - my, word
> >>
> >>
>

Re: Search query with uppercase getting different results

Posted by ozatomic <oz...@levelbelow.net>.
Thanks JAG,

Is this something that can configured so that it does not tokenize?

On 1/04/2022 1:30 pm, James Greene wrote:
> Expected, your search term is getting tokenized to the lowercase to
> uppercase is similar to a natural language 'word break' thus it gets
> tokenized to 'my word'.
>
> Cheers,
> JAG
>
> On Thu, Mar 31, 2022 at 10:27 PM ozatomic <oz...@levelbelow.net> wrote:
>
>> Hi,
>>
>> While doing some testing with my instance of solr I have noticed that if
>> i mix cases in a search string it is treated like 2 words? is this the
>> expected behavior? or a config issue.
>>
>>
>> Examples
>>
>> "myword" returns results with:
>> - myword
>> - myWord
>> - MYWord
>>
>>
>> "myWORD" returns results with:
>> - myword
>> - myWord
>> - MYWord
>> - My Word
>> - my, word
>>
>>

Re: Search query with uppercase getting different results

Posted by James Greene <ja...@jamesaustingreene.com>.
Expected, your search term is getting tokenized to the lowercase to
uppercase is similar to a natural language 'word break' thus it gets
tokenized to 'my word'.

Cheers,
JAG

On Thu, Mar 31, 2022 at 10:27 PM ozatomic <oz...@levelbelow.net> wrote:

> Hi,
>
> While doing some testing with my instance of solr I have noticed that if
> i mix cases in a search string it is treated like 2 words? is this the
> expected behavior? or a config issue.
>
>
> Examples
>
> "myword" returns results with:
> - myword
> - myWord
> - MYWord
>
>
> "myWORD" returns results with:
> - myword
> - myWord
> - MYWord
> - My Word
> - my, word
>
>