You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andreas Hubold <an...@coremedia.com> on 2015/09/23 21:00:32 UTC
Dismax and StandardTokenizer: OR queries despite mm=100%
Hi,
we're using Solr 4.10.4 and the dismax query parser to search across
multiple fields. One of the fields is configured with a
StandardTokenizer (type "text_general"). I set mm=100% to only get hits
that match all terms.
This does not seem to work for queries that are split into multiple
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
"001") returns documents that only have "cc" in it. I need a result with
documents that contains all tokens - as returned by the /select handler.
Is there a way to force AND semantics for such dismax queries? I also
tried to set q.op=AND but it did not help.
The query is parsed as:
(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
001")~0.1))/no_coord
Thanks in advance!
Regards,
Andreas
Re: Dismax and StandardTokenizer: OR queries despite mm=100%
Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Andreas,
You are correct, no re-indexing required for autoGeneratePhraseQueries.
Ahmet
On Thursday, September 24, 2015 3:52 PM, Andreas Hubold <an...@coremedia.com> wrote:
Thank you, autoGeneratePhraseQueries did the job.
I assume that this setting just affects query generation and I don't
need to reindex after changing the field type accordingly. Is this correct?
BTW, I just found SOLR-3589 where the same issue was reported and fixed
for the edismax parser. It seems it was fixed for edismax but not for
dismax.
Andreas
Ahmet Arslan wrote on 09/23/2015 09:25 PM:
> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>
Re: Dismax and StandardTokenizer: OR queries despite mm=100%
Posted by Andreas Hubold <an...@coremedia.com>.
Thank you, autoGeneratePhraseQueries did the job.
I assume that this setting just affects query generation and I don't
need to reindex after changing the field type accordingly. Is this correct?
BTW, I just found SOLR-3589 where the same issue was reported and fixed
for the edismax parser. It seems it was fixed for edismax but not for
dismax.
Andreas
Ahmet Arslan wrote on 09/23/2015 09:25 PM:
> Hi Andreas,
>
> Thats weird. It looks like mm calculation is done before the tokenization took place.
>
> You can try to set autoGeneratePhraseQueries to true
> or replace dashes with white-spaces at client side.
>
> Ahmet
>
>
>
> On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across
> multiple fields. One of the fields is configured with a
> StandardTokenizer (type "text_general"). I set mm=100% to only get hits
> that match all terms.
>
> This does not seem to work for queries that are split into multiple
> tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
> "001") returns documents that only have "cc" in it. I need a result with
> documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also
> tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
> productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
> 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas
>
Re: Dismax and StandardTokenizer: OR queries despite mm=100%
Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Andreas,
Thats weird. It looks like mm calculation is done before the tokenization took place.
You can try to set autoGeneratePhraseQueries to true
or replace dashes with white-spaces at client side.
Ahmet
On Wednesday, September 23, 2015 10:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
Hi,
we're using Solr 4.10.4 and the dismax query parser to search across
multiple fields. One of the fields is configured with a
StandardTokenizer (type "text_general"). I set mm=100% to only get hits
that match all terms.
This does not seem to work for queries that are split into multiple
tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav",
"001") returns documents that only have "cc" in it. I need a result with
documents that contains all tokens - as returned by the /select handler.
Is there a way to force AND semantics for such dismax queries? I also
tried to set q.op=AND but it did not help.
The query is parsed as:
(+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) |
productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav
001")~0.1))/no_coord
Thanks in advance!
Regards,
Andreas
Re: Dismax and StandardTokenizer: OR queries despite mm=100%
Posted by bi...@gmail.com.
Use fq
Bill Bell
Sent from mobile
> On Sep 23, 2015, at 1:00 PM, Andreas Hubold <an...@coremedia.com> wrote:
>
> Hi,
>
> we're using Solr 4.10.4 and the dismax query parser to search across multiple fields. One of the fields is configured with a StandardTokenizer (type "text_general"). I set mm=100% to only get hits that match all terms.
>
> This does not seem to work for queries that are split into multiple tokens. For example a query for "CC-WAV-001" (tokenized to "cc", "wav", "001") returns documents that only have "cc" in it. I need a result with documents that contains all tokens - as returned by the /select handler.
>
> Is there a way to force AND semantics for such dismax queries? I also tried to set q.op=AND but it did not help.
>
> The query is parsed as:
>
> (+DisjunctionMaxQuery(((textbody:cc textbody:wav textbody:001) | productCode:CC-WAV-001)~0.1) DisjunctionMaxQuery((textbody:"cc wav 001")~0.1))/no_coord
>
> Thanks in advance!
>
> Regards,
> Andreas