You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andreas Hubold <an...@coremedia.com> on 2015/09/23 21:00:32 UTC

Dismax and StandardTokenizer: OR queries despite mm=100%

Hi,

we're using Solr 4.10.4 and the dismax query parser to search across 
multiple fields. One of the fields is configured with a 
StandardTokenizer (type "text_general"). I set mm=100% to only get hits 
that match all terms.

This does not seem to work for queries that are split into multiple 
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav", 
"001") returns documents that only have "cc" in it. I need a result with 
documents that contains all tokens - as returned by the /select handler.

Is there a way to force AND semantics for such dismax queries? I also 
tried to set q.op=AND but it did not help.

The query is parsed as:

(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) | 
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav 
001")~0.1))/no_coord

Thanks in advance!

Regards,
Andreas

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Andreas,

You are correct, no re-indexing required for autoGeneratePhraseQueries.

Ahmet



On Thursday, September 24, 2015 3:52 PM, Andreas Hubold <an...@coremedia.com> wrote:
Thank you, autoGeneratePhraseQueries did the job.

I assume that this setting just affects query generation and I don't 
need to reindex after changing the field type accordingly. Is this correct?

BTW, I just found SOLR-3589 where the same issue was reported and fixed 
for the edismax parser. It seems it was fixed for edismax but not for 
dismax.

Andreas


Ahmet Arslan wrote on 09/23/2015 09:25 PM:
> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Posted by Andreas Hubold <an...@coremedia.com>.
Thank you, autoGeneratePhraseQueries did the job.

I assume that this setting just affects query generation and I don't 
need to reindex after changing the field type accordingly. Is this correct?

BTW, I just found SOLR-3589 where the same issue was reported and fixed 
for the edismax parser. It seems it was fixed for edismax but not for 
dismax.

Andreas

Ahmet Arslan wrote on 09/23/2015 09:25 PM:
> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>


Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Andreas,

Thats weird. It looks like mm calculation is done before the tokenization took place.

You can try to set autoGeneratePhraseQueries to true 
or replace dashes with white-spaces at client side.

Ahmet



On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
Hi,

we're using Solr 4.10.4 and the dismax query parser to search across 
multiple fields. One of the fields is configured with a 
StandardTokenizer (type "text_general"). I set mm=100% to only get hits 
that match all terms.

This does not seem to work for queries that are split into multiple 
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav", 
"001") returns documents that only have "cc" in it. I need a result with 
documents that contains all tokens - as returned by the /select handler.

Is there a way to force AND semantics for such dismax queries? I also 
tried to set q.op=AND but it did not help.

The query is parsed as:

(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) | 
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav 
001")~0.1))/no_coord

Thanks in advance!

Regards,
Andreas

Re: Dismax and StandardTokenizer: OR queries despite mm=100%

Posted by bi...@gmail.com.
Use fq 

Bill Bell
Sent from mobile


> On Sep 23, 2015, at 1:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
> 
> Hi,
> 
> we're using Solr 4.10.4 and the dismax query parser to search across multiple fields. One of the fields is configured with a StandardTokenizer (type "text_general"). I set mm=100% to only get hits that match all terms.
> 
> This does not seem to work for queries that are split into multiple tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav", "001") returns documents that only have "cc" in it. I need a result with documents that contains all tokens - as returned by the /select handler.
> 
> Is there a way to force AND semantics for such dismax queries? I also tried to set q.op=AND but it did not help.
> 
> The query is parsed as:
> 
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) | productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav 001")~0.1))/no_coord
> 
> Thanks in advance!
> 
> Regards,
> Andreas