You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Teresa McMains <te...@t14-consulting.com> on 2022/03/30 18:25:53 UTC

query with quoted string unexpected results

Hello all.

I've been looking through some old questions and answers and didn't quite see what I was looking for.

I have a client using a third party piece of software that leverages solr for some global-search type capabilities. The user can enter any search string and we want to return any documents of any type that have a match on the search string, checking any fields in the document. Quoted strings should return an exact match.

So in their examples, searching for Joe Smith, unquoted, could return customers with the name Joe Smith or Emily Smith or Joe Jones, etc. But if it were quoted, then we only want the exact match "Joe Smith" or, I guess, "Joe Smithland" or something.

The problem we're having is that the unquoted search returns everything correctly, quoted strings with two search terms seem okay, but a quoted search string with multiple terms like "Lead to Succeed" fails. We're trying to have it return the document "Lead to Succeed Inc, DBA". Perhaps its failing because it's only a partial match?? But even if I search for the quoted string "Lead to Succeed Inc, DBA", I do not get a match.

There are no stopwords.
There are no synonyms.

Now unfortunately I don't have access to the solr admin UI because the customer put it behind a firewall and won't give me access. So that's fun. But I've been playing around with the query URL just trying to get anything to work and I can't.

So for example:
https://localhost:8343/MyAppURL/rest/solr/select?q=LEAD%2520TO%2520SUCCEED&rows=100&start=0&wt=json
returns 107 matches, including one with the name we're looking for.
but 
https://localhost:8343/MyAppURL/rest/solr/select?q=%2522LEAD%2520TO%2520SUCCEED%2522&rows=100&start=0&wt=json
returns 0 matches

I've tried replacing %2520 with %26%26 or %2526%2526 (&&) or with %2B or %252B (+) but no luck there either -- whether I include the quotes or not.

I know there's a debug parameter &debug=all or &debugQuery=true but when I include those terms in my URL nothing changes at all in the results. So I'm just not seeing the debug output. It there something else I need to do to enable it?
If this is a matter of needing a "fuzzier" match, how do I include that in the search query URL -- it wasn't clear to me from the documentation?

Many many thanks!
Teresa

From solrconfig -- if it's helpful:

  <requestHandler name="/select" class="solr.SearchHandler">
      <lst name="defaults">
        <str name="q">*:*</str>
        <str name="defType">edismax</str>
        <str name="stopwords">true</str>
        <str name="lowercaseOperators">true</str>
        <str name="rows">10</str>
        <str name="df">_tokens</str>
                           <!-- phrase boosting...only affects relevancy, not inclusion -->
                           <str name="pf">_tokens^3</str>
                           <str name="ps">10</str>
                           <str name="pf2">_tokens^2</str>
                           <str name="ps2">1</str>

                           <!-- field boosting -->
                           <str name="qf">_primaryLabels^10 doc_id^50 telephoneNumbers^5 nationalIdentifiers^5 _tokens</str>
                           <str name="f._primaryLabels.qf">
                                        alertid
                                        account_name
                                        account_number
                                        bank_name
                                        party_name
                                        party_number
                                        head_of_household_name
                                        ext_party_number
                                        full_name
                                        associate_full_name
                                        attachment_name
                                        title
                                        filename
                                </str>
      </lst>
  </requestHandler>


From schema.xml:

<field name="account_customer_name" type="text_general" indexed="true" stored="true" multiValued="false" required="false"/>
<field name="account_name" type="text_general" indexed="true" stored="true" multiValued="false" required="false"/>
<field name="account_number" type="string" indexed="true" stored="true" multiValued="false" required="false"/>
...
These search fields are all text_general or string.

        <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0"   splitOnNumerics="0"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>



RE: query with quoted string unexpected results

Posted by Teresa McMains <te...@t14-consulting.com>.
An update: 
I did get access to the solr admin page and I'm trying a query simply on 

Lead to succeed
Vs
"Lead to succeed"

And the first returns records, including the one I want and the second does not.

Can someone explain, though, the debug output for the quoted search string -- specifically, why things like full_name (the field that we're expecting to match) looks like this: full_name:\"(lead to succeed lead) to succeed\" ? Full_Name is a text_general field. Also what is the /no_coord doing?

"debug": {
    "rawquerystring": "\"lead to succeed\"",
    "querystring": "\"lead to succeed\"",
    "parsedquery": "(+DisjunctionMaxQuery((((account_number:LEADTOSUCCEED | associate_full_name:\"(lead to succeed lead) to succeed\" | attachment_name:lead to succeed | title:\"lead ? succeed\" | party_number:lead to succeed | party_name:\"(lead to succeed lead) to succeed\" | account_name:\"(lead to succeed lead) to succeed\" | bank_name:\"lead ? succeed\" | head_of_household_name:\"(lead to succeed lead) to succeed\" | entity_id:lead to succeed | ext_party_number:lead to succeed | full_name:\"(lead to succeed lead) to succeed\" | filename:\"lead ? succeed\" | alertid:\"lead ? succeed\" | telephoneNumbers:^5.0 | _tokens:\"lead to succeed\" | doc_id:lead to succeed^50.0 | nationalIdentifiers:lead to succeed^5.0)) ())/no_coord",
    "parsedquery_toString": "+(((account_number:LEADTOSUCCEED | associate_full_name:\"(lead to succeed lead) to succeed\" | attachment_name:lead to succeed | title:\"lead ? succeed\" | party_number:lead to succeed | party_name:\"(lead to succeed lead) to succeed\" | account_name:\"(lead to succeed lead) to succeed\" | bank_name:\"lead ? succeed\" | head_of_household_name:\"(lead to succeed lead) to succeed\" | entity_id:lead to succeed | ext_party_number:lead to succeed | full_name:\"(lead to succeed lead) to succeed\" | filename:\"lead ? succeed\" | alertid:\"lead ? succeed\" | telephoneNumbers:^5.0 | _tokens:\"lead to succeed\" | doc_id:lead to succeed^50.0 | nationalIdentifiers:lead to succeed^5.0) ()",
    "explain": {},
    "QParser": "ExtendedDismaxQParser",
   "altquerystring": null,
    "boost_queries": null,
    "parsed_boost_queries": [],
    "boostfuncs": null,
    "filter_queries": [
      "-doc_type:trxn"
    ]

When I look at the quoted string "Lead to Succeed" on the Analysis tab in the Admin tool, and parse it with text_general or as full_name, it matches exactly between the index and the query.

Many thanks,
Teresa


-----Original Message-----
From: Teresa McMains 
Sent: Wednesday, March 30, 2022 2:26 PM
To: users@solr.apache.org
Subject: query with quoted string unexpected results

Hello all.

I've been looking through some old questions and answers and didn't quite see what I was looking for.

I have a client using a third party piece of software that leverages solr for some global-search type capabilities. The user can enter any search string and we want to return any documents of any type that have a match on the search string, checking any fields in the document. Quoted strings should return an exact match.

So in their examples, searching for Joe Smith, unquoted, could return customers with the name Joe Smith or Emily Smith or Joe Jones, etc. But if it were quoted, then we only want the exact match "Joe Smith" or, I guess, "Joe Smithland" or something.

The problem we're having is that the unquoted search returns everything correctly, quoted strings with two search terms seem okay, but a quoted search string with multiple terms like "Lead to Succeed" fails. We're trying to have it return the document "Lead to Succeed Inc, DBA". Perhaps its failing because it's only a partial match?? But even if I search for the quoted string "Lead to Succeed Inc, DBA", I do not get a match.

There are no stopwords.
There are no synonyms.

Now unfortunately I don't have access to the solr admin UI because the customer put it behind a firewall and won't give me access. So that's fun. But I've been playing around with the query URL just trying to get anything to work and I can't.

So for example:
https://localhost:8343/MyAppURL/rest/solr/select?q=LEAD%2520TO%2520SUCCEED&rows=100&start=0&wt=json
returns 107 matches, including one with the name we're looking for.
but
https://localhost:8343/MyAppURL/rest/solr/select?q=%2522LEAD%2520TO%2520SUCCEED%2522&rows=100&start=0&wt=json
returns 0 matches

I've tried replacing %2520 with %26%26 or %2526%2526 (&&) or with %2B or %252B (+) but no luck there either -- whether I include the quotes or not.

I know there's a debug parameter &debug=all or &debugQuery=true but when I include those terms in my URL nothing changes at all in the results. So I'm just not seeing the debug output. It there something else I need to do to enable it?
If this is a matter of needing a "fuzzier" match, how do I include that in the search query URL -- it wasn't clear to me from the documentation?

Many many thanks!
Teresa

From solrconfig -- if it's helpful:

  <requestHandler name="/select" class="solr.SearchHandler">
      <lst name="defaults">
        <str name="q">*:*</str>
        <str name="defType">edismax</str>
        <str name="stopwords">true</str>
        <str name="lowercaseOperators">true</str>
        <str name="rows">10</str>
        <str name="df">_tokens</str>
                           <!-- phrase boosting...only affects relevancy, not inclusion -->
                           <str name="pf">_tokens^3</str>
                           <str name="ps">10</str>
                           <str name="pf2">_tokens^2</str>
                           <str name="ps2">1</str>

                           <!-- field boosting -->
                           <str name="qf">_primaryLabels^10 doc_id^50 telephoneNumbers^5 nationalIdentifiers^5 _tokens</str>
                           <str name="f._primaryLabels.qf">
                                        alertid
                                        account_name
                                        account_number
                                        bank_name
                                        party_name
                                        party_number
                                        head_of_household_name
                                        ext_party_number
                                        full_name
                                        associate_full_name
                                        attachment_name
                                        title
                                        filename
                                </str>
      </lst>
  </requestHandler>


From schema.xml:

<field name="account_customer_name" type="text_general" indexed="true" stored="true" multiValued="false" required="false"/> <field name="account_name" type="text_general" indexed="true" stored="true" multiValued="false" required="false"/> <field name="account_number" type="string" indexed="true" stored="true" multiValued="false" required="false"/> ...
These search fields are all text_general or string.

        <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
            <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0"   splitOnNumerics="0"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0"/>
                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>



Re: query with quoted string unexpected results

Posted by Walter Underwood <wu...@wunderwood.org>.
Removing stopwords means that “lead to succeed” is corrupted in the index to “lead <blank_position> succeed”.

Don’t remove stopwords.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 30, 2022, at 11:25 AM, Teresa McMains <te...@t14-consulting.com> wrote:
> 
> Hello all.
> 
> I've been looking through some old questions and answers and didn't quite see what I was looking for.
> 
> I have a client using a third party piece of software that leverages solr for some global-search type capabilities. The user can enter any search string and we want to return any documents of any type that have a match on the search string, checking any fields in the document. Quoted strings should return an exact match.
> 
> So in their examples, searching for Joe Smith, unquoted, could return customers with the name Joe Smith or Emily Smith or Joe Jones, etc. But if it were quoted, then we only want the exact match "Joe Smith" or, I guess, "Joe Smithland" or something.
> 
> The problem we're having is that the unquoted search returns everything correctly, quoted strings with two search terms seem okay, but a quoted search string with multiple terms like "Lead to Succeed" fails. We're trying to have it return the document "Lead to Succeed Inc, DBA". Perhaps its failing because it's only a partial match?? But even if I search for the quoted string "Lead to Succeed Inc, DBA", I do not get a match.
> 
> There are no stopwords.
> There are no synonyms.
> 
> Now unfortunately I don't have access to the solr admin UI because the customer put it behind a firewall and won't give me access. So that's fun. But I've been playing around with the query URL just trying to get anything to work and I can't.
> 
> So for example:
> https://localhost:8343/MyAppURL/rest/solr/select?q=LEAD%2520TO%2520SUCCEED&rows=100&start=0&wt=json
> returns 107 matches, including one with the name we're looking for.
> but 
> https://localhost:8343/MyAppURL/rest/solr/select?q=%2522LEAD%2520TO%2520SUCCEED%2522&rows=100&start=0&wt=json
> returns 0 matches
> 
> I've tried replacing %2520 with %26%26 or %2526%2526 (&&) or with %2B or %252B (+) but no luck there either -- whether I include the quotes or not.
> 
> I know there's a debug parameter &debug=all or &debugQuery=true but when I include those terms in my URL nothing changes at all in the results. So I'm just not seeing the debug output. It there something else I need to do to enable it?
> If this is a matter of needing a "fuzzier" match, how do I include that in the search query URL -- it wasn't clear to me from the documentation?
> 
> Many many thanks!
> Teresa
> 
> From solrconfig -- if it's helpful:
> 
>  <requestHandler name="/select" class="solr.SearchHandler">
>      <lst name="defaults">
>        <str name="q">*:*</str>
>        <str name="defType">edismax</str>
>        <str name="stopwords">true</str>
>        <str name="lowercaseOperators">true</str>
>        <str name="rows">10</str>
>        <str name="df">_tokens</str>
>                           <!-- phrase boosting...only affects relevancy, not inclusion -->
>                           <str name="pf">_tokens^3</str>
>                           <str name="ps">10</str>
>                           <str name="pf2">_tokens^2</str>
>                           <str name="ps2">1</str>
> 
>                           <!-- field boosting -->
>                           <str name="qf">_primaryLabels^10 doc_id^50 telephoneNumbers^5 nationalIdentifiers^5 _tokens</str>
>                           <str name="f._primaryLabels.qf">
>                                        alertid
>                                        account_name
>                                        account_number
>                                        bank_name
>                                        party_name
>                                        party_number
>                                        head_of_household_name
>                                        ext_party_number
>                                        full_name
>                                        associate_full_name
>                                        attachment_name
>                                        title
>                                        filename
>                                </str>
>      </lst>
>  </requestHandler>
> 
> 
> From schema.xml:
> 
> <field name="account_customer_name" type="text_general" indexed="true" stored="true" multiValued="false" required="false"/>
> <field name="account_name" type="text_general" indexed="true" stored="true" multiValued="false" required="false"/>
> <field name="account_number" type="string" indexed="true" stored="true" multiValued="false" required="false"/>
> ...
> These search fields are all text_general or string.
> 
>        <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
>            <analyzer type="index">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0"   splitOnNumerics="0"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" enablePositionIncrements="true" />
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.KeywordTokenizerFactory"/>
>                <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnCaseChange="0" splitOnNumerics="0"/>
>                <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>            </analyzer>
>        </fieldType>
> 
>