You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Villemos, Gert" <ge...@logica.com> on 2009/09/10 17:39:15 UTC

Facet fields and the DisMax query handler

I'm trying to understand the DisMax query handler. I orginally
configured it to ensure that the query was mapped onto different fields
in the documents and a boost assigned if the fields match. And that
works pretty smoothly.
 
However when it comes to facetted searches the results perplexes me.
Consider the following example;
 
Document A:
    <field name="Staff">John Doe</field>
 
Document B:
    <field name="ProjectManager">John Doe</field>
 
The following queries does not return anything;
    Staff:Doe
    Staff:Doe*
    Staff:John
    Staff:John*
 
The query;
    Staff:"John"
 
Returns Document A and B, even though document B doesnt even contain the
field 'Staff' (which is optional)! Through the "qf" field dismax has
been configured to search over the field 'ProjectManager' but I expected
the usage of a facet value would exclude the field... Looking at the
score of the documents, document A does score much higher than Document
B (a factor 20) but I would expect not to see B at all. I have changed
the dismax configuration minimum match to be 1, to ensure that all hits
with a single match is returned without effect. I have changed the tie
to 0 with no effect.
 
What am I missing here? I would like queries such as 'Staff:Doe' to
return document A, and only A.
 
Cheers,
Gert.
 


Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu sch�tzen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


RE: Facet fields and the DisMax query handler

Posted by "Villemos, Gert" <ge...@logica.com>.
Thanks. Maybe I'm misusing the dismax request handler, but the ability to search all fields is just too good a feature.

I found the following description of how to do facetted queries with the dismax. I have not tried it yet but will.

http://fisk.stjernesludd.net/archives/16-Solr-Using-the-dismax-Query-Handler-and-Still-Limit-a-Specific-Field.html 

Cheers,
Gert.





-----Original Message-----
From: Lance Norskog [mailto:goksron@gmail.com] 
Sent: Freitag, 11. September 2009 04:56
To: solr-user@lucene.apache.org
Subject: Re: Facet fields and the DisMax query handler

Facets are not involved here. These are only simple searches.

The DisMax parser does not use field names in the query. DisMax creates a
nice simple syntax for people to type into a web browser search field. The
various parameters let you sculpt the relevance in order to tune the user
experience.

There are ways to intermix dismax parsing in the standard query parser
syntax, but I am no expert. You can also use these field queries as filter
queries; this is a hack but does work. Also, using wildcards interferes with
upper/lower case handling.

On 9/10/09, Villemos, Gert <ge...@logica.com> wrote:
>
> I'm trying to understand the DisMax query handler. I orginally
> configured it to ensure that the query was mapped onto different fields
> in the documents and a boost assigned if the fields match. And that
> works pretty smoothly.
>
> However when it comes to facetted searches the results perplexes me.
> Consider the following example;
>
> Document A:
>    <field name="Staff">John Doe</field>
>
> Document B:
>    <field name="ProjectManager">John Doe</field>
>
> The following queries does not return anything;
>    Staff:Doe
>    Staff:Doe*
>    Staff:John
>    Staff:John*
>
> The query;
>    Staff:"John"
>
> Returns Document A and B, even though document B doesnt even contain the
> field 'Staff' (which is optional)! Through the "qf" field dismax has
> been configured to search over the field 'ProjectManager' but I expected
> the usage of a facet value would exclude the field... Looking at the
> score of the documents, document A does score much higher than Document
> B (a factor 20) but I would expect not to see B at all. I have changed
> the dismax configuration minimum match to be 1, to ensure that all hits
> with a single match is returned without effect. I have changed the tie
> to 0 with no effect.
>
> What am I missing here? I would like queries such as 'Staff:Doe' to
> return document A, and only A.
>
> Cheers,
> Gert.
>
>
>
> Please help Logica to respect the environment by not printing this email  /
> Pour contribuer comme Logica au respect de l'environnement, merci de ne pas
> imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen
> Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
> respeitar o ambiente nao imprimindo este correio electronico.
>
>
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
>


-- 
Lance Norskog
goksron@gmail.com

Please help Logica to respect the environment by not printing this email  / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.



Re: Facet fields and the DisMax query handler

Posted by Lance Norskog <go...@gmail.com>.
Facets are not involved here. These are only simple searches.

The DisMax parser does not use field names in the query. DisMax creates a
nice simple syntax for people to type into a web browser search field. The
various parameters let you sculpt the relevance in order to tune the user
experience.

There are ways to intermix dismax parsing in the standard query parser
syntax, but I am no expert. You can also use these field queries as filter
queries; this is a hack but does work. Also, using wildcards interferes with
upper/lower case handling.

On 9/10/09, Villemos, Gert <ge...@logica.com> wrote:
>
> I'm trying to understand the DisMax query handler. I orginally
> configured it to ensure that the query was mapped onto different fields
> in the documents and a boost assigned if the fields match. And that
> works pretty smoothly.
>
> However when it comes to facetted searches the results perplexes me.
> Consider the following example;
>
> Document A:
>    <field name="Staff">John Doe</field>
>
> Document B:
>    <field name="ProjectManager">John Doe</field>
>
> The following queries does not return anything;
>    Staff:Doe
>    Staff:Doe*
>    Staff:John
>    Staff:John*
>
> The query;
>    Staff:"John"
>
> Returns Document A and B, even though document B doesnt even contain the
> field 'Staff' (which is optional)! Through the "qf" field dismax has
> been configured to search over the field 'ProjectManager' but I expected
> the usage of a facet value would exclude the field... Looking at the
> score of the documents, document A does score much higher than Document
> B (a factor 20) but I would expect not to see B at all. I have changed
> the dismax configuration minimum match to be 1, to ensure that all hits
> with a single match is returned without effect. I have changed the tie
> to 0 with no effect.
>
> What am I missing here? I would like queries such as 'Staff:Doe' to
> return document A, and only A.
>
> Cheers,
> Gert.
>
>
>
> Please help Logica to respect the environment by not printing this email  /
> Pour contribuer comme Logica au respect de l'environnement, merci de ne pas
> imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen
> Sie so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
> respeitar o ambiente nao imprimindo este correio electronico.
>
>
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
>


-- 
Lance Norskog
goksron@gmail.com