You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Potharaju <js...@gmail.com> on 2019/01/14 05:30:29 UTC

Search query with & without question mark

Hi,
When searching  I get different results when the query contains question
mark vs without question mark  . The field i am searching on does not have
any question marks.
Any suggestions?

<fieldType name="text_no_specialchars" class="solr.TextField"> <analyzer> <
tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
"solr.PatternReplaceFilterFactory" pattern=
"['!#\$%'\(\)\*+,-\./:;=?@\[\]^_`{|}~]" replacement=" " replace="all" /> <
filter class="solr.LowerCaseFilterFactory"/> <filter class=
"solr.SuggestStopFilterFactory" ignoreCase="true" words=
"lang/stopwords_en.txt" /> <filter class=
"solr.EnglishPossessiveFilterFactory"/> <filter class=
"solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> </analyzer> </
fieldType>

Thanks
Jay

Re: Search query with & without question mark

Posted by Elizabeth Haubert <eh...@opensourceconnections.com>.
Because the standard query parser treats '?' as a single-character wildcard:
https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.html

So in the case q="how do I add a field", the word "field" in your document
matches.  In the second case q="how do I add a field?" it is looking for
tokens like "fields" or "fielde";   The term without a trailing 1-character
suffix doesn't match anymore.   That is why it is no longer included in the
scoring.

https://lucene.apache.org/solr/guide/7_6/the-standard-query-parser.html#wildcard-searches

Elizabeth


On Mon, Jan 14, 2019 at 2:07 AM Jay Potharaju <js...@gmail.com> wrote:

> the parsedquery is same when debugging, but when calculating the scores
> different fields are being taken into consideration. Why would that be the
> case? My guess is that the suggeststopfilterfactory is not working as i
> expect it to and causing this weird situation.
>
> Updated field type definition:
> <fieldType name="text_no_specialchars" class="solr.TextField"> <analyzer> <
> charFilter class="solr.PatternReplaceCharFilterFactory" pattern=
> "['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" /> <tokenizer class=
> "solr.StandardTokenizerFactory"/> <filter class=
> "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> "lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <
> filter class="solr.EnglishPossessiveFilterFactory"/> <filter class=
> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> </analyzer>
> </
> fieldType>
>
> Debug Query:
> *"rawquerystring":"how do i add a field"*,
>     "querystring":"how do i add a field",
>     "parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
> DisjunctionMaxQuery((topic_title_plain:do))
> DisjunctionMaxQuery((topic_title_plain:i))
> DisjunctionMaxQuery((topic_title_plain:add))
> DisjunctionMaxQuery((topic_title_plain:a))
> DisjunctionMaxQuery((topic_title_plain:field))))/no_coord",
>     "parsedquery_toString":"+((topic_title_plain:how)
> (topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
> (topic_title_plain:a) (topic_title_plain:field))",
>     "explain":{
>       "1":"
> 6.1034017 = sum of:
>   2.0065408 = weight(topic_title_plain:add in 107) [SchemaSimilarity],
> result of:
>     2.0065408 = score(doc=107,freq=1.0 = termFreq=1.0
> ), product of:
>       2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
>         32.0 = docFreq
>         275.0 = docCount
>       0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:
>         1.0 = termFreq=1.0
>         1.2 = parameter k1
>         0.75 = parameter b
>         3.4436364 = avgFieldLength
>         4.0 = fieldLength
>   4.096861 = weight(topic_title_plain:field in 107) [SchemaSimilarity],
> result of:
>     4.096861 = score(doc=107,freq=1.0 = termFreq=1.0
> ), product of:
>       4.367638 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
>         3.0 = docFreq
>         275.0 = docCount
>       0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:
>         1.0 = termFreq=1.0
>         1.2 = parameter k1
>         0.75 = parameter b
>         3.4436364 = avgFieldLength
>         4.0 = fieldLength
> "},
>
> *rawquerystring":"how do i add a field?",*
>     "querystring":"how do i add a field?",
>     "parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
> DisjunctionMaxQuery((topic_title_plain:do))
> DisjunctionMaxQuery((topic_title_plain:i))
> DisjunctionMaxQuery((topic_title_plain:add))
> DisjunctionMaxQuery((topic_title_plain:a))
> DisjunctionMaxQuery((topic_title_plain:field))))/no_coord",
>     "parsedquery_toString":"+((topic_title_plain:how)
> (topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
> (topic_title_plain:a) (topic_title_plain:field))",
>     "explain":{
>       "2":"
> 3.798876 = sum of:
>   2.033249 = weight(topic_title_plain:how in 202) [SchemaSimilarity],
> result of:
>     2.033249 = score(doc=202,freq=1.0 = termFreq=1.0
> ), product of:
>       2.4634004 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
>         23.0 = docFreq
>         275.0 = docCount
>       0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
> - b + b * fieldLength / avgFieldLength)) from:
>         1.0 = termFreq=1.0
>         1.2 = parameter k1
>         0.75 = parameter b
>         3.4436364 = avgFieldLength
>         5.2244897 = fieldLength
> *  1.7656271 = weight(topic_title_plain:add in 202) [SchemaSimilarity],
> result of:*
>     1.7656271 = score(doc=202,freq=1.0 = termFreq=1.0
> ), product of:
>       2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:
>         32.0 = docFreq
>         275.0 = docCount
>       0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
> - b + b * fieldLength / avgFieldLength)) from:
>         1.0 = termFreq=1.0
>         1.2 = parameter k1
>         0.75 = parameter b
>         3.4436364 = avgFieldLength
>         5.2244897 = fieldLength
> "},
> Thanks
> Jay
>
>
>
> On Sun, Jan 13, 2019 at 10:32 PM Erick Erickson <er...@gmail.com>
> wrote:
>
> > What does adding &debug=query show in both cases?
> >
> > Best,
> > Erick
> >
> > On Sun, Jan 13, 2019 at 9:30 PM Jay Potharaju <js...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > > When searching  I get different results when the query contains
> question
> > > mark vs without question mark  . The field i am searching on does not
> > have
> > > any question marks.
> > > Any suggestions?
> > >
> > > <fieldType name="text_no_specialchars" class="solr.TextField">
> > <analyzer> <
> > > tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> > > "solr.PatternReplaceFilterFactory" pattern=
> > > "['!#\$%'\(\)\*+,-\./:;=?@\[\]^_`{|}~]" replacement=" " replace="all"
> />
> > <
> > > filter class="solr.LowerCaseFilterFactory"/> <filter class=
> > > "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> > > "lang/stopwords_en.txt" /> <filter class=
> > > "solr.EnglishPossessiveFilterFactory"/> <filter class=
> > > "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> > </analyzer> </
> > > fieldType>
> > >
> > > Thanks
> > > Jay
> >
>

Re: Search query with & without question mark

Posted by Jay Potharaju <js...@gmail.com>.
the parsedquery is same when debugging, but when calculating the scores
different fields are being taken into consideration. Why would that be the
case? My guess is that the suggeststopfilterfactory is not working as i
expect it to and causing this weird situation.

Updated field type definition:
<fieldType name="text_no_specialchars" class="solr.TextField"> <analyzer> <
charFilter class="solr.PatternReplaceCharFilterFactory" pattern=
"['!#\$%'\(\)\*+,-\./:;=?@\[\]\^_`{|}~!@#$%^*]" /> <tokenizer class=
"solr.StandardTokenizerFactory"/> <filter class=
"solr.SuggestStopFilterFactory" ignoreCase="true" words=
"lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory"/> <
filter class="solr.EnglishPossessiveFilterFactory"/> <filter class=
"solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> </analyzer> </
fieldType>

Debug Query:
*"rawquerystring":"how do i add a field"*,
    "querystring":"how do i add a field",
    "parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
DisjunctionMaxQuery((topic_title_plain:do))
DisjunctionMaxQuery((topic_title_plain:i))
DisjunctionMaxQuery((topic_title_plain:add))
DisjunctionMaxQuery((topic_title_plain:a))
DisjunctionMaxQuery((topic_title_plain:field))))/no_coord",
    "parsedquery_toString":"+((topic_title_plain:how)
(topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
(topic_title_plain:a) (topic_title_plain:field))",
    "explain":{
      "1":"
6.1034017 = sum of:
  2.0065408 = weight(topic_title_plain:add in 107) [SchemaSimilarity],
result of:
    2.0065408 = score(doc=107,freq=1.0 = termFreq=1.0
), product of:
      2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
        32.0 = docFreq
        275.0 = docCount
      0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        3.4436364 = avgFieldLength
        4.0 = fieldLength
  4.096861 = weight(topic_title_plain:field in 107) [SchemaSimilarity],
result of:
    4.096861 = score(doc=107,freq=1.0 = termFreq=1.0
), product of:
      4.367638 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
        3.0 = docFreq
        275.0 = docCount
      0.9380037 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        3.4436364 = avgFieldLength
        4.0 = fieldLength
"},

*rawquerystring":"how do i add a field?",*
    "querystring":"how do i add a field?",
    "parsedquery":"(+(DisjunctionMaxQuery((topic_title_plain:how))
DisjunctionMaxQuery((topic_title_plain:do))
DisjunctionMaxQuery((topic_title_plain:i))
DisjunctionMaxQuery((topic_title_plain:add))
DisjunctionMaxQuery((topic_title_plain:a))
DisjunctionMaxQuery((topic_title_plain:field))))/no_coord",
    "parsedquery_toString":"+((topic_title_plain:how)
(topic_title_plain:do) (topic_title_plain:i) (topic_title_plain:add)
(topic_title_plain:a) (topic_title_plain:field))",
    "explain":{
      "2":"
3.798876 = sum of:
  2.033249 = weight(topic_title_plain:how in 202) [SchemaSimilarity],
result of:
    2.033249 = score(doc=202,freq=1.0 = termFreq=1.0
), product of:
      2.4634004 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
        23.0 = docFreq
        275.0 = docCount
      0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
- b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        3.4436364 = avgFieldLength
        5.2244897 = fieldLength
*  1.7656271 = weight(topic_title_plain:add in 202) [SchemaSimilarity],
result of:*
    1.7656271 = score(doc=202,freq=1.0 = termFreq=1.0
), product of:
      2.1391609 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:
        32.0 = docFreq
        275.0 = docCount
      0.82538307 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
- b + b * fieldLength / avgFieldLength)) from:
        1.0 = termFreq=1.0
        1.2 = parameter k1
        0.75 = parameter b
        3.4436364 = avgFieldLength
        5.2244897 = fieldLength
"},
Thanks
Jay



On Sun, Jan 13, 2019 at 10:32 PM Erick Erickson <er...@gmail.com>
wrote:

> What does adding &debug=query show in both cases?
>
> Best,
> Erick
>
> On Sun, Jan 13, 2019 at 9:30 PM Jay Potharaju <js...@gmail.com>
> wrote:
> >
> > Hi,
> > When searching  I get different results when the query contains question
> > mark vs without question mark  . The field i am searching on does not
> have
> > any question marks.
> > Any suggestions?
> >
> > <fieldType name="text_no_specialchars" class="solr.TextField">
> <analyzer> <
> > tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> > "solr.PatternReplaceFilterFactory" pattern=
> > "['!#\$%'\(\)\*+,-\./:;=?@\[\]^_`{|}~]" replacement=" " replace="all" />
> <
> > filter class="solr.LowerCaseFilterFactory"/> <filter class=
> > "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> > "lang/stopwords_en.txt" /> <filter class=
> > "solr.EnglishPossessiveFilterFactory"/> <filter class=
> > "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> </analyzer> </
> > fieldType>
> >
> > Thanks
> > Jay
>

Re: Search query with & without question mark

Posted by Erick Erickson <er...@gmail.com>.
What does adding &debug=query show in both cases?

Best,
Erick

On Sun, Jan 13, 2019 at 9:30 PM Jay Potharaju <js...@gmail.com> wrote:
>
> Hi,
> When searching  I get different results when the query contains question
> mark vs without question mark  . The field i am searching on does not have
> any question marks.
> Any suggestions?
>
> <fieldType name="text_no_specialchars" class="solr.TextField"> <analyzer> <
> tokenizer class="solr.StandardTokenizerFactory"/> <filter class=
> "solr.PatternReplaceFilterFactory" pattern=
> "['!#\$%'\(\)\*+,-\./:;=?@\[\]^_`{|}~]" replacement=" " replace="all" /> <
> filter class="solr.LowerCaseFilterFactory"/> <filter class=
> "solr.SuggestStopFilterFactory" ignoreCase="true" words=
> "lang/stopwords_en.txt" /> <filter class=
> "solr.EnglishPossessiveFilterFactory"/> <filter class=
> "solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> </analyzer> </
> fieldType>
>
> Thanks
> Jay