You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by "WU, Zhiqing" <zw...@ennov.com> on 2022/01/07 16:09:26 UTC

Range query on TextField

Hello,
I am learning Solr.
In "The Standard Query Parser", I find:
Range queries are not limited to date fields or even numerical fields, but
also use with non-date fields (e.g. title:{Aida TO Carmen})

I tried a range query in a Solr database (8.3)
staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
staffName_txt is defined as a TextField.
Most searched results are correct but "Mr Kenyon John" is also in the
result list.
I think 'M' is after 'L' and should not be included in the result.
May I ask what is wrong in my query? Is there a way to avoid the problem?
Many thanks in advance.
Kind regards,
Zhiqing

Re: Range query on TextField

Posted by "WU, Zhiqing" <zw...@ennov.com>.
Hi Andy,
Many thanks for your quick reply.
Yes, you are right. According to the webpage of Solr 8.5, I should not edit
"managed-schema". However, when I create the new core (bin/solr create -c
newcore), I only can find managed-schema in server/solr/newcore/conf
folder, and am not able to find schema.xml in any folder belonging to the
core. Some webpages mention renaming "managed-schema" to schema.xml. The
change on managed-schema via Schema API is limited: I can add fields but I
could not know how to change "solr.StandardTokenizerFactory" to
"solr.KeywordTokenizerFactory" via Schema API. I only find "Add Field",
"Add Dynamic Field" and "Add Copy Field" (after clicking "Schema" above
"Segments info") but I have not found something like "Add FieldType" in
Solr UI.
After I installed Solr, it did not have a core. Therefore, I created a core
(newcore, empty, without any document) and then added 4 new documents via
Solr "Documents". After documents have been added, do I need to do
something for index?

Yes, I understand I should create a new fieldType rather than modifying the
text_general fieldType. If I create a new fieldType, could I set the class
of tokenizer to "solr.KeywordTokenizerFactory"?
I will remove StopFilterFactory and SynonymGraphFilterFactory filters. I am
a new hand in Solr and some of my operations might be wrong.

Zhiqing

On Wed, 12 Jan 2022 at 19:12, Andy C <an...@gmail.com> wrote:

> Also it doesn't make sense to use the StopFilterFactory or
> SynonymGraphFilterFactory filters in conjunction with the
> KeywordTokenizerFactor, so these should be removed from the fieldType
> definition (personally I would never make use of the StopFilterFactory,
> except in specialized situations).
>
> - Andy -
>
> On Wed, Jan 12, 2022 at 2:02 PM Andy C <an...@gmail.com> wrote:
>
> > How are you changing the managed-schema? I have never used the managed
> > schema feature myself, but according to the documentation (
> >
> https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file
> )
> > it should never be directly edited. Not sure how it is supposed to be
> > updated.
> >
> > Did you recreate your indexes after changing the schema (delete the
> > existing indexes and re-add your 4 documents)? This would be necessary,
> as
> > the schema configuration at the time the documents are ingested would
> > determine how they are indexed.
> >
> > Also, you may want to consider creating a new fieldType rather than
> > modifying the text_general fieldType, and explicitly map the
> staffName_txt
> > field to it. Otherwise you will change how searching works for all fields
> > that use this the  text_general fieldType (you would no longer be able to
> > retrieve documents by searching for individual words in the text). If you
> > want to support both behaviors, you might want to create multiple
> versions
> > of the field using the copyField feature.
> >
> > Hope this helps.
> > - Andy -
> >
> > On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <zw...@ennov.com> wrote:
> >
> >> Hi Andy,
> >>
> >> Loads of thanks for your reply. I am trying to figure out my problem by
> >> following your advice.
> >>
> >> I have installed Solr (8.5) on my computer and added 4 documents into
> >> a core.
> >>
> >> In the 4 documents, staffName_txt field has been set to "Lindmar
> Deborah",
> >> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.
> >>
> >>
> >>
> >> At the beginning, without changing anything in managed-schema, I did two
> >> range queries:
> >>
> >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> >> Deborah", "Mr Kenyon John" and " Saab Jerry"
> >>
> >> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
> >>
> >>
> >>
> >> After that, I find the fieldType of "text_general" in managed-schema:
> >>
> >>   <fieldType name="text_general" class="solr.TextField"
> >> positionIncrementGap="100" multiValued="true">
> >>
> >>     <analyzer type="index">
> >>
> >>       <tokenizer class="solr.StandardTokenizerFactory"/>
> >>
> >>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> >> ignoreCase="true"/>
> >>
> >>       <filter class="solr.LowerCaseFilterFactory"/>
> >>
> >>     </analyzer>
> >>
> >>     <analyzer type="query">
> >>
> >>       <tokenizer class="solr.StandardTokenizerFactory"/>
> >>
> >>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> >> ignoreCase="true"/>
> >>
> >>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
> >> ignoreCase="true" synonyms="synonyms.txt"/>
> >>
> >>       <filter class="solr.LowerCaseFilterFactory"/>
> >>
> >>     </analyzer>
> >>
> >>   </fieldType>
> >>
> >> ...
> >>
> >>   <dynamicField name="*_txt" type="text_general" indexed="true"
> >> stored="true"/>
> >>
> >> ...
> >>
> >> and change two "solr.StandardTokenizerFactory" to
> >> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
> >> queries:
> >>
> >> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> >> Deborah", "Mr Kenyon John" and " Saab Jerry"
> >>
> >> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> >> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
> >>
> >> It seems nothing has changed in the results.
> >>
> >> Is there anything else I could change?
> >>
> >> Looking forward to your reply.
> >>
> >> Zhiqing
> >>
> >> On Fri, 7 Jan 2022 at 18:12, Andy C <an...@gmail.com> wrote:
> >>
> >> > The behavior of the range query would depend on how the fieldType used
> >> by
> >> > the staffName_txt is configured.
> >> >
> >> > I believe you will find that TextField is not the fieldType, but the
> >> base
> >> > class your fieldType is implemented on.
> >> >
> >> > To use an example from one of the provided example schemas, the
> "_text"
> >> > field is defined as using the "text_general" fieldType
> >> >
> >> >    <field name="_text_" type="text_general" indexed="true"
> >> stored="false"
> >> > multiValued="true"/>
> >> >
> >> > The text_general fieldType is defined as:
> >> >
> >> >     <fieldType name="text_general" class="solr.TextField"
> >> > positionIncrementGap="100" multiValued="true">
> >> >       <analyzer type="index">
> >> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt" />
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >       </analyzer>
> >> >       <analyzer type="query">
> >> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> >> > words="stopwords.txt" />
> >> >         <filter class="solr.SynonymGraphFilterFactory"
> >> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >       </analyzer>
> >> >     </fieldType>
> >> >
> >> > This fieldType definition splits the contents of the field into
> multiple
> >> > tokens which each get indexed. So for example "Mr Kenyon John" would
> >> > generate 3 tokens: "Mr", "Kenyon" and "John".
> >> >
> >> > If you performed your range query on this field, it would check each
> >> token
> >> > separately to see if it was in the specified range. If any token was,
> >> the
> >> > document would be retrieved.
> >> >
> >> > If you want the entire contents of the field to be treated as a single
> >> > token, which seems to be your intent, then you should look at using a
> >> > fieldType that is based on the Keyword Tokenizer (see
> >> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
> >> >
> >> > - Andy -
> >> >
> >> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <zw...@ennov.com> wrote:
> >> >
> >> > > Many thanks for your reply. I have changed my query to
> >> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> >> > > staffName_txt:["gross bob" TO "lindmar deborah"]
> >> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> >> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John",
> my
> >> > > search result contains " Saab Jerry", which is very confusing.
> >> > > Therefore, I think the problem is probably not because of "character
> >> > case"
> >> > >
> >> > > On Fri, 7 Jan 2022 at 17:12, Srijan <sh...@gmail.com> wrote:
> >> > >
> >> > > > My guess is inconsistent "character case" (uppercase/lowercase) in
> >> your
> >> > > > indexed data vs your search query. For example, I would expect
> >> > something
> >> > > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to
> return
> >> "Mr
> >> > > > Kenyon John" as M indeed does lie between G and l.
> >> > > >
> >> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com>
> wrote:
> >> > > >
> >> > > > > Hello,
> >> > > > > I am learning Solr.
> >> > > > > In "The Standard Query Parser", I find:
> >> > > > > Range queries are not limited to date fields or even numerical
> >> > fields,
> >> > > > but
> >> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> >> > > > >
> >> > > > > I tried a range query in a Solr database (8.3)
> >> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> >> > > > > staffName_txt is defined as a TextField.
> >> > > > > Most searched results are correct but "Mr Kenyon John" is also
> in
> >> the
> >> > > > > result list.
> >> > > > > I think 'M' is after 'L' and should not be included in the
> result.
> >> > > > > May I ask what is wrong in my query? Is there a way to avoid the
> >> > > problem?
> >> > > > > Many thanks in advance.
> >> > > > > Kind regards,
> >> > > > > Zhiqing
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Range query on TextField

Posted by Andy C <an...@gmail.com>.
Also it doesn't make sense to use the StopFilterFactory or
SynonymGraphFilterFactory filters in conjunction with the
KeywordTokenizerFactor, so these should be removed from the fieldType
definition (personally I would never make use of the StopFilterFactory,
except in specialized situations).

- Andy -

On Wed, Jan 12, 2022 at 2:02 PM Andy C <an...@gmail.com> wrote:

> How are you changing the managed-schema? I have never used the managed
> schema feature myself, but according to the documentation (
> https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file)
> it should never be directly edited. Not sure how it is supposed to be
> updated.
>
> Did you recreate your indexes after changing the schema (delete the
> existing indexes and re-add your 4 documents)? This would be necessary, as
> the schema configuration at the time the documents are ingested would
> determine how they are indexed.
>
> Also, you may want to consider creating a new fieldType rather than
> modifying the text_general fieldType, and explicitly map the staffName_txt
> field to it. Otherwise you will change how searching works for all fields
> that use this the  text_general fieldType (you would no longer be able to
> retrieve documents by searching for individual words in the text). If you
> want to support both behaviors, you might want to create multiple versions
> of the field using the copyField feature.
>
> Hope this helps.
> - Andy -
>
> On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <zw...@ennov.com> wrote:
>
>> Hi Andy,
>>
>> Loads of thanks for your reply. I am trying to figure out my problem by
>> following your advice.
>>
>> I have installed Solr (8.5) on my computer and added 4 documents into
>> a core.
>>
>> In the 4 documents, staffName_txt field has been set to "Lindmar Deborah",
>> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.
>>
>>
>>
>> At the beginning, without changing anything in managed-schema, I did two
>> range queries:
>>
>> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
>> Deborah", "Mr Kenyon John" and " Saab Jerry"
>>
>> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
>> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>>
>>
>>
>> After that, I find the fieldType of "text_general" in managed-schema:
>>
>>   <fieldType name="text_general" class="solr.TextField"
>> positionIncrementGap="100" multiValued="true">
>>
>>     <analyzer type="index">
>>
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> ignoreCase="true"/>
>>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>
>>     </analyzer>
>>
>>     <analyzer type="query">
>>
>>       <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> ignoreCase="true"/>
>>
>>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
>> ignoreCase="true" synonyms="synonyms.txt"/>
>>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>
>>     </analyzer>
>>
>>   </fieldType>
>>
>> ...
>>
>>   <dynamicField name="*_txt" type="text_general" indexed="true"
>> stored="true"/>
>>
>> ...
>>
>> and change two "solr.StandardTokenizerFactory" to
>> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
>> queries:
>>
>> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
>> Deborah", "Mr Kenyon John" and " Saab Jerry"
>>
>> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
>> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>>
>> It seems nothing has changed in the results.
>>
>> Is there anything else I could change?
>>
>> Looking forward to your reply.
>>
>> Zhiqing
>>
>> On Fri, 7 Jan 2022 at 18:12, Andy C <an...@gmail.com> wrote:
>>
>> > The behavior of the range query would depend on how the fieldType used
>> by
>> > the staffName_txt is configured.
>> >
>> > I believe you will find that TextField is not the fieldType, but the
>> base
>> > class your fieldType is implemented on.
>> >
>> > To use an example from one of the provided example schemas, the "_text"
>> > field is defined as using the "text_general" fieldType
>> >
>> >    <field name="_text_" type="text_general" indexed="true"
>> stored="false"
>> > multiValued="true"/>
>> >
>> > The text_general fieldType is defined as:
>> >
>> >     <fieldType name="text_general" class="solr.TextField"
>> > positionIncrementGap="100" multiValued="true">
>> >       <analyzer type="index">
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt" />
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >       </analyzer>
>> >       <analyzer type="query">
>> >         <tokenizer class="solr.StandardTokenizerFactory"/>
>> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
>> > words="stopwords.txt" />
>> >         <filter class="solr.SynonymGraphFilterFactory"
>> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>> >         <filter class="solr.LowerCaseFilterFactory"/>
>> >       </analyzer>
>> >     </fieldType>
>> >
>> > This fieldType definition splits the contents of the field into multiple
>> > tokens which each get indexed. So for example "Mr Kenyon John" would
>> > generate 3 tokens: "Mr", "Kenyon" and "John".
>> >
>> > If you performed your range query on this field, it would check each
>> token
>> > separately to see if it was in the specified range. If any token was,
>> the
>> > document would be retrieved.
>> >
>> > If you want the entire contents of the field to be treated as a single
>> > token, which seems to be your intent, then you should look at using a
>> > fieldType that is based on the Keyword Tokenizer (see
>> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
>> >
>> > - Andy -
>> >
>> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <zw...@ennov.com> wrote:
>> >
>> > > Many thanks for your reply. I have changed my query to
>> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
>> > > staffName_txt:["gross bob" TO "lindmar deborah"]
>> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
>> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
>> > > search result contains " Saab Jerry", which is very confusing.
>> > > Therefore, I think the problem is probably not because of "character
>> > case"
>> > >
>> > > On Fri, 7 Jan 2022 at 17:12, Srijan <sh...@gmail.com> wrote:
>> > >
>> > > > My guess is inconsistent "character case" (uppercase/lowercase) in
>> your
>> > > > indexed data vs your search query. For example, I would expect
>> > something
>> > > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return
>> "Mr
>> > > > Kenyon John" as M indeed does lie between G and l.
>> > > >
>> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com> wrote:
>> > > >
>> > > > > Hello,
>> > > > > I am learning Solr.
>> > > > > In "The Standard Query Parser", I find:
>> > > > > Range queries are not limited to date fields or even numerical
>> > fields,
>> > > > but
>> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
>> > > > >
>> > > > > I tried a range query in a Solr database (8.3)
>> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
>> > > > > staffName_txt is defined as a TextField.
>> > > > > Most searched results are correct but "Mr Kenyon John" is also in
>> the
>> > > > > result list.
>> > > > > I think 'M' is after 'L' and should not be included in the result.
>> > > > > May I ask what is wrong in my query? Is there a way to avoid the
>> > > problem?
>> > > > > Many thanks in advance.
>> > > > > Kind regards,
>> > > > > Zhiqing
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Range query on TextField

Posted by Andy C <an...@gmail.com>.
How are you changing the managed-schema? I have never used the managed
schema feature myself, but according to the documentation (
https://solr.apache.org/guide/8_5/overview-of-documents-fields-and-schema-design.html#solrs-schema-file)
it should never be directly edited. Not sure how it is supposed to be
updated.

Did you recreate your indexes after changing the schema (delete the
existing indexes and re-add your 4 documents)? This would be necessary, as
the schema configuration at the time the documents are ingested would
determine how they are indexed.

Also, you may want to consider creating a new fieldType rather than
modifying the text_general fieldType, and explicitly map the staffName_txt
field to it. Otherwise you will change how searching works for all fields
that use this the  text_general fieldType (you would no longer be able to
retrieve documents by searching for individual words in the text). If you
want to support both behaviors, you might want to create multiple versions
of the field using the copyField feature.

Hope this helps.
- Andy -

On Wed, Jan 12, 2022 at 12:48 PM WU, Zhiqing <zw...@ennov.com> wrote:

> Hi Andy,
>
> Loads of thanks for your reply. I am trying to figure out my problem by
> following your advice.
>
> I have installed Solr (8.5) on my computer and added 4 documents into
> a core.
>
> In the 4 documents, staffName_txt field has been set to "Lindmar Deborah",
> "Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.
>
>
>
> At the beginning, without changing anything in managed-schema, I did two
> range queries:
>
> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> Deborah", "Mr Kenyon John" and " Saab Jerry"
>
> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>
>
>
> After that, I find the fieldType of "text_general" in managed-schema:
>
>   <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>
>     <analyzer type="index">
>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>
>       <filter class="solr.LowerCaseFilterFactory"/>
>
>     </analyzer>
>
>     <analyzer type="query">
>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>
>       <filter class="solr.LowerCaseFilterFactory"/>
>
>     </analyzer>
>
>   </fieldType>
>
> ...
>
>   <dynamicField name="*_txt" type="text_general" indexed="true"
> stored="true"/>
>
> ...
>
> and change two "solr.StandardTokenizerFactory" to
> "solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
> queries:
>
> q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
> Deborah", "Mr Kenyon John" and " Saab Jerry"
>
> q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
> Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"
>
> It seems nothing has changed in the results.
>
> Is there anything else I could change?
>
> Looking forward to your reply.
>
> Zhiqing
>
> On Fri, 7 Jan 2022 at 18:12, Andy C <an...@gmail.com> wrote:
>
> > The behavior of the range query would depend on how the fieldType used by
> > the staffName_txt is configured.
> >
> > I believe you will find that TextField is not the fieldType, but the base
> > class your fieldType is implemented on.
> >
> > To use an example from one of the provided example schemas, the "_text"
> > field is defined as using the "text_general" fieldType
> >
> >    <field name="_text_" type="text_general" indexed="true" stored="false"
> > multiValued="true"/>
> >
> > The text_general fieldType is defined as:
> >
> >     <fieldType name="text_general" class="solr.TextField"
> > positionIncrementGap="100" multiValued="true">
> >       <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >         <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >       </analyzer>
> >     </fieldType>
> >
> > This fieldType definition splits the contents of the field into multiple
> > tokens which each get indexed. So for example "Mr Kenyon John" would
> > generate 3 tokens: "Mr", "Kenyon" and "John".
> >
> > If you performed your range query on this field, it would check each
> token
> > separately to see if it was in the specified range. If any token was, the
> > document would be retrieved.
> >
> > If you want the entire contents of the field to be treated as a single
> > token, which seems to be your intent, then you should look at using a
> > fieldType that is based on the Keyword Tokenizer (see
> > https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
> >
> > - Andy -
> >
> > On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <zw...@ennov.com> wrote:
> >
> > > Many thanks for your reply. I have changed my query to
> > > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> > > staffName_txt:["gross bob" TO "lindmar deborah"]
> > > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> > > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
> > > search result contains " Saab Jerry", which is very confusing.
> > > Therefore, I think the problem is probably not because of "character
> > case"
> > >
> > > On Fri, 7 Jan 2022 at 17:12, Srijan <sh...@gmail.com> wrote:
> > >
> > > > My guess is inconsistent "character case" (uppercase/lowercase) in
> your
> > > > indexed data vs your search query. For example, I would expect
> > something
> > > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return
> "Mr
> > > > Kenyon John" as M indeed does lie between G and l.
> > > >
> > > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com> wrote:
> > > >
> > > > > Hello,
> > > > > I am learning Solr.
> > > > > In "The Standard Query Parser", I find:
> > > > > Range queries are not limited to date fields or even numerical
> > fields,
> > > > but
> > > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> > > > >
> > > > > I tried a range query in a Solr database (8.3)
> > > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> > > > > staffName_txt is defined as a TextField.
> > > > > Most searched results are correct but "Mr Kenyon John" is also in
> the
> > > > > result list.
> > > > > I think 'M' is after 'L' and should not be included in the result.
> > > > > May I ask what is wrong in my query? Is there a way to avoid the
> > > problem?
> > > > > Many thanks in advance.
> > > > > Kind regards,
> > > > > Zhiqing
> > > > >
> > > >
> > >
> >
>

Re: Range query on TextField

Posted by "WU, Zhiqing" <zw...@ennov.com>.
Hi Andy,

Loads of thanks for your reply. I am trying to figure out my problem by
following your advice.

I have installed Solr (8.5) on my computer and added 4 documents into
a core.

In the 4 documents, staffName_txt field has been set to "Lindmar Deborah",
"Mr Kenyon John", " Saab Jerry" and "Gross Bob" respectively.



At the beginning, without changing anything in managed-schema, I did two
range queries:

q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
Deborah", "Mr Kenyon John" and " Saab Jerry"

q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"



After that, I find the fieldType of "text_general" in managed-schema:

  <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">

    <analyzer type="index">

      <tokenizer class="solr.StandardTokenizerFactory"/>

      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>

      <filter class="solr.LowerCaseFilterFactory"/>

    </analyzer>

    <analyzer type="query">

      <tokenizer class="solr.StandardTokenizerFactory"/>

      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>

      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>

      <filter class="solr.LowerCaseFilterFactory"/>

    </analyzer>

  </fieldType>

...

  <dynamicField name="*_txt" type="text_general" indexed="true"
stored="true"/>

...

and change two "solr.StandardTokenizerFactory" to
"solr.KeywordTokenizerFactory". I restart my Solr and repeat two range
queries:

q: staffName_txt:["Gross Bob" TO "Lindmar Deborah"]   result: "Lindmar
Deborah", "Mr Kenyon John" and " Saab Jerry"

q: staffName_txt:[* TO "Lindmar Deborah"]             result: "Lindmar
Deborah", "Mr Kenyon John", " Saab Jerry" and "Gross Bob"

It seems nothing has changed in the results.

Is there anything else I could change?

Looking forward to your reply.

Zhiqing

On Fri, 7 Jan 2022 at 18:12, Andy C <an...@gmail.com> wrote:

> The behavior of the range query would depend on how the fieldType used by
> the staffName_txt is configured.
>
> I believe you will find that TextField is not the fieldType, but the base
> class your fieldType is implemented on.
>
> To use an example from one of the provided example schemas, the "_text"
> field is defined as using the "text_general" fieldType
>
>    <field name="_text_" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
> The text_general fieldType is defined as:
>
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> This fieldType definition splits the contents of the field into multiple
> tokens which each get indexed. So for example "Mr Kenyon John" would
> generate 3 tokens: "Mr", "Kenyon" and "John".
>
> If you performed your range query on this field, it would check each token
> separately to see if it was in the specified range. If any token was, the
> document would be retrieved.
>
> If you want the entire contents of the field to be treated as a single
> token, which seems to be your intent, then you should look at using a
> fieldType that is based on the Keyword Tokenizer (see
> https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).
>
> - Andy -
>
> On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <zw...@ennov.com> wrote:
>
> > Many thanks for your reply. I have changed my query to
> > staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> > staffName_txt:["gross bob" TO "lindmar deborah"]
> > staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> > Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
> > search result contains " Saab Jerry", which is very confusing.
> > Therefore, I think the problem is probably not because of "character
> case"
> >
> > On Fri, 7 Jan 2022 at 17:12, Srijan <sh...@gmail.com> wrote:
> >
> > > My guess is inconsistent "character case" (uppercase/lowercase) in your
> > > indexed data vs your search query. For example, I would expect
> something
> > > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return "Mr
> > > Kenyon John" as M indeed does lie between G and l.
> > >
> > > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com> wrote:
> > >
> > > > Hello,
> > > > I am learning Solr.
> > > > In "The Standard Query Parser", I find:
> > > > Range queries are not limited to date fields or even numerical
> fields,
> > > but
> > > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> > > >
> > > > I tried a range query in a Solr database (8.3)
> > > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> > > > staffName_txt is defined as a TextField.
> > > > Most searched results are correct but "Mr Kenyon John" is also in the
> > > > result list.
> > > > I think 'M' is after 'L' and should not be included in the result.
> > > > May I ask what is wrong in my query? Is there a way to avoid the
> > problem?
> > > > Many thanks in advance.
> > > > Kind regards,
> > > > Zhiqing
> > > >
> > >
> >
>

Re: Range query on TextField

Posted by Andy C <an...@gmail.com>.
The behavior of the range query would depend on how the fieldType used by
the staffName_txt is configured.

I believe you will find that TextField is not the fieldType, but the base
class your fieldType is implemented on.

To use an example from one of the provided example schemas, the "_text"
field is defined as using the "text_general" fieldType

   <field name="_text_" type="text_general" indexed="true" stored="false"
multiValued="true"/>

The text_general fieldType is defined as:

    <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
        <filter class="solr.SynonymGraphFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

This fieldType definition splits the contents of the field into multiple
tokens which each get indexed. So for example "Mr Kenyon John" would
generate 3 tokens: "Mr", "Kenyon" and "John".

If you performed your range query on this field, it would check each token
separately to see if it was in the specified range. If any token was, the
document would be retrieved.

If you want the entire contents of the field to be treated as a single
token, which seems to be your intent, then you should look at using a
fieldType that is based on the Keyword Tokenizer (see
https://solr.apache.org/guide/8_3/tokenizers.html#keyword-tokenizer).

- Andy -

On Fri, Jan 7, 2022 at 12:35 PM WU, Zhiqing <zw...@ennov.com> wrote:

> Many thanks for your reply. I have changed my query to
> staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
> staffName_txt:["gross bob" TO "lindmar deborah"]
> staffName_txt:["Gross Bob" TO "lindmar Deborah"]
> Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
> search result contains " Saab Jerry", which is very confusing.
> Therefore, I think the problem is probably not because of "character case"
>
> On Fri, 7 Jan 2022 at 17:12, Srijan <sh...@gmail.com> wrote:
>
> > My guess is inconsistent "character case" (uppercase/lowercase) in your
> > indexed data vs your search query. For example, I would expect something
> > like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return "Mr
> > Kenyon John" as M indeed does lie between G and l.
> >
> > On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com> wrote:
> >
> > > Hello,
> > > I am learning Solr.
> > > In "The Standard Query Parser", I find:
> > > Range queries are not limited to date fields or even numerical fields,
> > but
> > > also use with non-date fields (e.g. title:{Aida TO Carmen})
> > >
> > > I tried a range query in a Solr database (8.3)
> > > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> > > staffName_txt is defined as a TextField.
> > > Most searched results are correct but "Mr Kenyon John" is also in the
> > > result list.
> > > I think 'M' is after 'L' and should not be included in the result.
> > > May I ask what is wrong in my query? Is there a way to avoid the
> problem?
> > > Many thanks in advance.
> > > Kind regards,
> > > Zhiqing
> > >
> >
>

Re: Range query on TextField

Posted by "WU, Zhiqing" <zw...@ennov.com>.
Many thanks for your reply. I have changed my query to
staffName_txt:["GROSS BOB" TO "LINDMAR DEBORAH"]
staffName_txt:["gross bob" TO "lindmar deborah"]
staffName_txt:["Gross Bob" TO "lindmar Deborah"]
Their "numFound" are identical (177). Apart from "Mr Kenyon John", my
search result contains " Saab Jerry", which is very confusing.
Therefore, I think the problem is probably not because of "character case"

On Fri, 7 Jan 2022 at 17:12, Srijan <sh...@gmail.com> wrote:

> My guess is inconsistent "character case" (uppercase/lowercase) in your
> indexed data vs your search query. For example, I would expect something
> like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return "Mr
> Kenyon John" as M indeed does lie between G and l.
>
> On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com> wrote:
>
> > Hello,
> > I am learning Solr.
> > In "The Standard Query Parser", I find:
> > Range queries are not limited to date fields or even numerical fields,
> but
> > also use with non-date fields (e.g. title:{Aida TO Carmen})
> >
> > I tried a range query in a Solr database (8.3)
> > staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> > staffName_txt is defined as a TextField.
> > Most searched results are correct but "Mr Kenyon John" is also in the
> > result list.
> > I think 'M' is after 'L' and should not be included in the result.
> > May I ask what is wrong in my query? Is there a way to avoid the problem?
> > Many thanks in advance.
> > Kind regards,
> > Zhiqing
> >
>

Re: Range query on TextField

Posted by Srijan <sh...@gmail.com>.
My guess is inconsistent "character case" (uppercase/lowercase) in your
indexed data vs your search query. For example, I would expect something
like  staffName_txt:[ "Gross Bob" TO "lindmar Deborah"]   to return "Mr
Kenyon John" as M indeed does lie between G and l.

On Fri, Jan 7, 2022 at 11:10 AM WU, Zhiqing <zw...@ennov.com> wrote:

> Hello,
> I am learning Solr.
> In "The Standard Query Parser", I find:
> Range queries are not limited to date fields or even numerical fields, but
> also use with non-date fields (e.g. title:{Aida TO Carmen})
>
> I tried a range query in a Solr database (8.3)
> staffName_txt:[ "Gross Bob" TO "Lindmar Deborah"]
> staffName_txt is defined as a TextField.
> Most searched results are correct but "Mr Kenyon John" is also in the
> result list.
> I think 'M' is after 'L' and should not be included in the result.
> May I ask what is wrong in my query? Is there a way to avoid the problem?
> Many thanks in advance.
> Kind regards,
> Zhiqing
>