You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shamik Bandopadhyay <sh...@gmail.com> on 2014/02/19 02:53:49 UTC

Weird behavior of stopwords in search query

Hi,

  I'm observing a weird behavior while using stopwords as part of the
search query. I'm able to replicate it in standalone Solr instance well.
The issue pops up when I'm trying to use "other" and "and" stopword
together in a query string. The query doesn't return any result. But it
works with any other combination. For e.g.

1. query yields no result -->
http://localhost:8983/solr/collection1/browse?q=AWS+other+and+Search&debugQuery=true&wt=xml


Debug Query :
--------------------

<str name="rawquerystring">AWS other and Search</str>

<str name="querystring">AWS other and Search</str><str
name="parsedquery">(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 |
author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 |
keywords:search^5.0 | manu:search^1.1 | description:search^5.0 |
resourcename:search | name:search^1.2 | features:search |
sku:search^1.5))))/no_coord</str>

<str name="parsedquery_toString">+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5)
+(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5
| cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 |
description:search^5.0 | resourcename:search | name:search^1.2 |
features:search | sku:search^1.5))</str>





2. query yields result -->
http://localhost:8983/solr/collection1/browse?q=AWS+other+an+Search&debugQuery=true&wt=xml

Debug Query
---------------------

<str name="rawquerystring">AWS other an Search</str>

<str name="querystring">AWS other an Search</str><str
name="parsedquery">(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 |
cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))))/no_coord</str>

<str name="parsedquery_toString">+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5)
(id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))</str>

Both "other" and "and" are part of the stopwords list.

I ran an analysis on text_general field, both stopwords were shows as
ignored during indexing and query time, but not happening during actual
search.

Not sure what I'm missing here, any pointers will be appreciated.

- Thanks,
Shamik

Re: Weird behavior of stopwords in search query

Posted by Jack Krupansky <ja...@basetechnology.com>.
Simply add the lowecaserOperators=false parameter or add it to the 
"defaults" section of the request handler in solrconfig, and then "and" will 
not be treated as "AND".

The wiki is confusing - it shouldn't be advising you how to set the 
parameter to achieve the default setting! Rather, it should tell you how to 
override the default setting.

-- Jack Krupansky

-----Original Message----- 
From: Ahmet Arslan
Sent: Wednesday, February 19, 2014 4:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Weird behavior of stopwords in search query

Hi Samik,

Please see parameter of edismax. 
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
If lowercaseOperators=true then and is treated as AND. Also stopwords 
parameter could be used.

Stopwords and edismax had issues (when mm=100%) in history. Not sure current 
situation but you may need to apply same set of stopwords to all fields 
listed in qf parameter. Even to string types. String type should be replaced 
with KeywordTokenizer + StopwordFilter combo.





On Wednesday, February 19, 2014 7:48 AM, shamik <sh...@gmail.com> wrote:
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :

<str name="qf">
          id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0

       </str>

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator.

The part what I'm not clear is how "and" is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Weird behavior of stopwords in search query

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Samik,

Please see parameter of edismax. https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
If lowercaseOperators=true then and is treated as AND. Also stopwords parameter could be used.

Stopwords and edismax had issues (when mm=100%) in history. Not sure current situation but you may need to apply same set of stopwords to all fields listed in qf parameter. Even to string types. String type should be replaced with KeywordTokenizer + StopwordFilter combo.





On Wednesday, February 19, 2014 7:48 AM, shamik <sh...@gmail.com> wrote:
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :

<str name="qf">
          id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0

       </str>

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator. 

The part what I'm not clear is how "and" is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Weird behavior of stopwords in search query

Posted by shamik <sh...@gmail.com>.
Jack, thanks for the pointer. I should have checked this closely. I'm using
edismax and here's my qf entry :

<str name="qf">
          id^10.0 cat^1.4 text^0.5 features^1.0 name^1.2 sku^1.5 manu^1.1
title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0
       </str>

As you can see, I was boosting id and cat which are of type string and of
course doesn't go through the stopwords filter. Removing them returned one
result which is based on AND operator. 

The part what I'm not clear is how "and" is being treated even through its a
stopword and the default operator is OR. Shouldn't this be ignored ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Weird-behavior-of-stopwords-in-search-query-tp4118156p4118188.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Weird behavior of stopwords in search query

Posted by Jack Krupansky <ja...@basetechnology.com>.
Does "other" appear in the id, cat, or sku fields? This clause requires it 
to appear in at least one of those fields:

+DisjunctionMaxQuery((id:other^10.0 | cat:other^1.4 | sku:other^1.5))

The "and" is treated as the "AND" operator. What query parser are you using?

Without "and", the terms are OR'ed, which is the default query operator.

-- Jack Krupansky

-----Original Message----- 
From: Shamik Bandopadhyay
Sent: Tuesday, February 18, 2014 8:53 PM
To: solr-user@lucene.apache.org
Subject: Weird behavior of stopwords in search query

Hi,

  I'm observing a weird behavior while using stopwords as part of the
search query. I'm able to replicate it in standalone Solr instance well.
The issue pops up when I'm trying to use "other" and "and" stopword
together in a query string. The query doesn't return any result. But it
works with any other combination. For e.g.

1. query yields no result -->
http://localhost:8983/solr/collection1/browse?q=AWS+other+and+Search&debugQuery=true&wt=xml


Debug Query :
--------------------

<str name="rawquerystring">AWS other and Search</str>

<str name="querystring">AWS other and Search</str><str
name="parsedquery">(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) +DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) +DisjunctionMaxQuery((id:Search^10.0 |
author:search^2.0 | title:search^10.0 | text:search^0.5 | cat:Search^1.4 |
keywords:search^5.0 | manu:search^1.1 | description:search^5.0 |
resourcename:search | name:search^1.2 | features:search |
sku:search^1.5))))/no_coord</str>

<str name="parsedquery_toString">+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) +(id:other^10.0 | cat:other^1.4 | sku:other^1.5)
+(id:Search^10.0 | author:search^2.0 | title:search^10.0 | text:search^0.5
| cat:Search^1.4 | keywords:search^5.0 | manu:search^1.1 |
description:search^5.0 | resourcename:search | name:search^1.2 |
features:search | sku:search^1.5))</str>





2. query yields result -->
http://localhost:8983/solr/collection1/browse?q=AWS+other+an+Search&debugQuery=true&wt=xml

Debug Query
---------------------

<str name="rawquerystring">AWS other an Search</str>

<str name="querystring">AWS other an Search</str><str
name="parsedquery">(+(DisjunctionMaxQuery((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5)) DisjunctionMaxQuery((id:other^10.0 |
cat:other^1.4 | sku:other^1.5)) DisjunctionMaxQuery((id:an^10.0 |
cat:an^1.4)) DisjunctionMaxQuery((id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))))/no_coord</str>

<str name="parsedquery_toString">+((id:AWS^10.0 | author:aws^2.0 |
title:aws^10.0 | text:aws^0.5 | cat:AWS^1.4 | keywords:aws^5.0 |
manu:aws^1.1 | description:aws^5.0 | resourcename:aws | name:aws^1.2 |
features:aws | sku:aw^1.5) (id:other^10.0 | cat:other^1.4 | sku:other^1.5)
(id:an^10.0 | cat:an^1.4) (id:Search^10.0 | author:search^2.0 |
title:search^10.0 | text:search^0.5 | cat:Search^1.4 | keywords:search^5.0
| manu:search^1.1 | description:search^5.0 | resourcename:search |
name:search^1.2 | features:search | sku:search^1.5))</str>

Both "other" and "and" are part of the stopwords list.

I ran an analysis on text_general field, both stopwords were shows as
ignored during indexing and query time, but not happening during actual
search.

Not sure what I'm missing here, any pointers will be appreciated.

- Thanks,
Shamik