You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Arif Shaon <ar...@gmail.com> on 2022/04/04 07:09:37 UTC

Help with stopwords filter

Hello list,

I am trying the following two queries, which should return the same result.
However, the first contains a stop word "is" and as a result its returning
0 result. So it seems to me that the stopword filter is not working as
expected. Could someone please look at the debug reports of the two queries
and advise what I am doing wrong?  Any help would be appreciated.

Query 1:

"rawquerystring":"thim day is gone",
      "querystring":"thim day is gone",
      "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0 |
(i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
(i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
(content:thim)^0.04)~0.01) +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
| (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
(i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01)
+DisjunctionMaxQuery(((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
(shelf_mark:is)^80.0)~0.01)
+DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
(i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
(i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
(content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day ?
gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
(+record_type:essay^17.0))/no_coord",
      "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
(i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
(i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
(content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
(i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
(i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01
+((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
(shelf_mark:is)^80.0)~0.01 +((i18n_content_ar:gone)^3.0 |
(i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
(i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
(content:gone)^0.04)~0.01) (+((content:\"thim day ? gone\"~10)^2.0)~0.01)
(+(record_type:logical)^15.0) (+(record_type:essay)^17.0)",
      "facet-debug":{
         "elapse":0,


Query 2:

"rawquerystring":"thim day gone",
      "querystring":"thim day gone",
      "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0 |
(i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
(i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
(content:thim)^0.04)~0.01) +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
| (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
(i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01)
+DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
(i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
(i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
(content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day
gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
(+record_type:essay^17.0))/no_coord",
      "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
(i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
(i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
(content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
(i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
(i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01
+((i18n_content_ar:gone)^3.0 | (i18n_content_en:gone)^3.0 |
(i18n_label_ar:gone)^5.0 | (i18n_label_en:gone)^5.0 |
(shelf_mark:gone)^80.0 | (content:gone)^0.04)~0.01) (+((content:\"thim day
gone\"~10)^2.0)~0.01) (+(record_type:logical)^15.0)
(+(record_type:essay)^17.0)",
      "facet-debug":{
         "elapse":1,

Many thanks in advance.

Best
Arif

Re: Help with stopwords filter

Posted by Arif Shaon <ar...@gmail.com>.
Hi Dominique,

Thanks for replying to my query.
"is" is defined as a stopword for the _en (English) fields but _ar (Arabic)
uses a custom analyser which is not clear about how it deals with stopwords.
I will look into it.

Thanks again.
Best
Arif

On Mon, Apr 4, 2022 at 3:42 PM Dominique Bejean <do...@eolya.fr>
wrote:

> Hi,
>
> Are you sure "is" is defined as a stopword at both index and query type in
> your analyzers ?
>
> Dominique
>
> Le lun. 4 avr. 2022 à 09:09, Arif Shaon <ar...@gmail.com> a écrit :
>
> > Hello list,
> >
> > I am trying the following two queries, which should return the same
> result.
> > However, the first contains a stop word "is" and as a result its
> returning
> > 0 result. So it seems to me that the stopword filter is not working as
> > expected. Could someone please look at the debug reports of the two
> queries
> > and advise what I am doing wrong?  Any help would be appreciated.
> >
> > Query 1:
> >
> > "rawquerystring":"thim day is gone",
> >       "querystring":"thim day is gone",
> >       "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0
> |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
> > | (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01)
> > +DisjunctionMaxQuery(((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
> > (shelf_mark:is)^80.0)~0.01)
> > +DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
> > (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> > (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> > (content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day ?
> > gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
> > (+record_type:essay^17.0))/no_coord",
> >       "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
> > (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01
> > +((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
> > (shelf_mark:is)^80.0)~0.01 +((i18n_content_ar:gone)^3.0 |
> > (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> > (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> > (content:gone)^0.04)~0.01) (+((content:\"thim day ? gone\"~10)^2.0)~0.01)
> > (+(record_type:logical)^15.0) (+(record_type:essay)^17.0)",
> >       "facet-debug":{
> >          "elapse":0,
> >
> >
> > Query 2:
> >
> > "rawquerystring":"thim day gone",
> >       "querystring":"thim day gone",
> >       "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0
> |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
> > | (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01)
> > +DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
> > (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> > (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> > (content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day
> > gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
> > (+record_type:essay^17.0))/no_coord",
> >       "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
> > (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> > (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> > (content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
> > (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> > (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 |
> (content:day)^0.04)~0.01
> > +((i18n_content_ar:gone)^3.0 | (i18n_content_en:gone)^3.0 |
> > (i18n_label_ar:gone)^5.0 | (i18n_label_en:gone)^5.0 |
> > (shelf_mark:gone)^80.0 | (content:gone)^0.04)~0.01) (+((content:\"thim
> day
> > gone\"~10)^2.0)~0.01) (+(record_type:logical)^15.0)
> > (+(record_type:essay)^17.0)",
> >       "facet-debug":{
> >          "elapse":1,
> >
> > Many thanks in advance.
> >
> > Best
> > Arif
> >
>

Re: Help with stopwords filter

Posted by Dominique Bejean <do...@eolya.fr>.
Hi,

Are you sure "is" is defined as a stopword at both index and query type in
your analyzers ?

Dominique

Le lun. 4 avr. 2022 à 09:09, Arif Shaon <ar...@gmail.com> a écrit :

> Hello list,
>
> I am trying the following two queries, which should return the same result.
> However, the first contains a stop word "is" and as a result its returning
> 0 result. So it seems to me that the stopword filter is not working as
> expected. Could someone please look at the debug reports of the two queries
> and advise what I am doing wrong?  Any help would be appreciated.
>
> Query 1:
>
> "rawquerystring":"thim day is gone",
>       "querystring":"thim day is gone",
>       "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0 |
> (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> (content:thim)^0.04)~0.01) +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
> | (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
> (shelf_mark:is)^80.0)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
> (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> (content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day ?
> gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
> (+record_type:essay^17.0))/no_coord",
>       "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
> (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> (content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
> (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01
> +((i18n_content_ar:is)^3.0 | (i18n_label_ar:is)^5.0 |
> (shelf_mark:is)^80.0)~0.01 +((i18n_content_ar:gone)^3.0 |
> (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> (content:gone)^0.04)~0.01) (+((content:\"thim day ? gone\"~10)^2.0)~0.01)
> (+(record_type:logical)^15.0) (+(record_type:essay)^17.0)",
>       "facet-debug":{
>          "elapse":0,
>
>
> Query 2:
>
> "rawquerystring":"thim day gone",
>       "querystring":"thim day gone",
>       "parsedquery":"(+(+DisjunctionMaxQuery(((i18n_content_ar:thim)^3.0 |
> (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> (content:thim)^0.04)~0.01) +DisjunctionMaxQuery(((i18n_content_ar:day)^3.0
> | (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01)
> +DisjunctionMaxQuery(((i18n_content_ar:gone)^3.0 |
> (i18n_content_en:gone)^3.0 | (i18n_label_ar:gone)^5.0 |
> (i18n_label_en:gone)^5.0 | (shelf_mark:gone)^80.0 |
> (content:gone)^0.04)~0.01)) (+DisjunctionMaxQuery(((content:\"thim day
> gone\"~10)^2.0)~0.01)) (+record_type:logical^15.0)
> (+record_type:essay^17.0))/no_coord",
>       "parsedquery_toString":"+(+((i18n_content_ar:thim)^3.0 |
> (i18n_content_en:thim)^3.0 | (i18n_label_ar:thim)^5.0 |
> (i18n_label_en:thim)^5.0 | (shelf_mark:thim)^80.0 |
> (content:thim)^0.04)~0.01 +((i18n_content_ar:day)^3.0 |
> (i18n_content_en:day)^3.0 | (i18n_label_ar:day)^5.0 |
> (i18n_label_en:day)^5.0 | (shelf_mark:day)^80.0 | (content:day)^0.04)~0.01
> +((i18n_content_ar:gone)^3.0 | (i18n_content_en:gone)^3.0 |
> (i18n_label_ar:gone)^5.0 | (i18n_label_en:gone)^5.0 |
> (shelf_mark:gone)^80.0 | (content:gone)^0.04)~0.01) (+((content:\"thim day
> gone\"~10)^2.0)~0.01) (+(record_type:logical)^15.0)
> (+(record_type:essay)^17.0)",
>       "facet-debug":{
>          "elapse":1,
>
> Many thanks in advance.
>
> Best
> Arif
>