You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Feak, Todd" <To...@smss.sony.com> on 2009/01/16 18:54:37 UTC
How to select *actual* match from a multi-valued field
At a high level, I'm trying to do some more intelligent searching using
an app that will send multiple queries to Solr. My current issue is
around multi-valued fields and determining which entry actually
generated the "hit" for a particular query.
For example, let's say that I have a multi-valued field containing
people's names, associated with the document (trying to be non-specific
on purpose). In one document, I have the following names:
Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
search for Bob Smith, this document is returned. What I want to know is
that this document was returned because of "Bob Smith", not because of
Jane or Roger. I've tried using the highlighting settings. They do
provide some help, as the Jane Doe entry doesn't come back highlighted,
but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
that seems to pertain only to fields, not entries within a multi-valued
field.
Using Solr, is there a way to get the information I am looking for?
Specifically, that "Bob Smith" is the value in the multi-valued field
that triggered the hit?
-Todd Feak
Re: How to select *actual* match from a multi-valued field
Posted by Toby Cole <to...@semantico.com>.
We came across this problem, unfortunately we gave up and did our hit-
highlighting for multi-valued fields on the frontend. :-/
One approach would be to extend solr to return every value of a multi-
valued field in the highlighting, regardless of whether that
particular value matched.
Just an idea, don't know if it's feasible or not. if anyone can point
me in the right direction I could probably bash together a plugin and
some tests.
Toby.
On 20 Jan 2009, at 16:31, Feak, Todd wrote:
> Anyone that can shed some insight?
>
> -Todd
>
> -----Original Message-----
> From: Feak, Todd [mailto:Todd.Feak@smss.sony.com]
> Sent: Friday, January 16, 2009 9:55 AM
> To: solr-user@lucene.apache.org
> Subject: How to select *actual* match from a multi-valued field
>
> At a high level, I'm trying to do some more intelligent searching
> using
> an app that will send multiple queries to Solr. My current issue is
> around multi-valued fields and determining which entry actually
> generated the "hit" for a particular query.
>
>
>
> For example, let's say that I have a multi-valued field containing
> people's names, associated with the document (trying to be non-
> specific
> on purpose). In one document, I have the following names:
>
> Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
> search for Bob Smith, this document is returned. What I want to know
> is
> that this document was returned because of "Bob Smith", not because of
> Jane or Roger. I've tried using the highlighting settings. They do
> provide some help, as the Jane Doe entry doesn't come back
> highlighted,
> but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
> that seems to pertain only to fields, not entries within a multi-
> valued
> field.
>
>
>
> Using Solr, is there a way to get the information I am looking for?
> Specifically, that "Bob Smith" is the value in the multi-valued field
> that triggered the hit?
>
>
>
> -Todd Feak
>
Toby Cole
Software Engineer
Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: toby.cole@semantico.com
W: www.semantico.com
RE: How to select *actual* match from a multi-valued field
Posted by "Feak, Todd" <To...@smss.sony.com>.
Anyone that can shed some insight?
-Todd
-----Original Message-----
From: Feak, Todd [mailto:Todd.Feak@smss.sony.com]
Sent: Friday, January 16, 2009 9:55 AM
To: solr-user@lucene.apache.org
Subject: How to select *actual* match from a multi-valued field
At a high level, I'm trying to do some more intelligent searching using
an app that will send multiple queries to Solr. My current issue is
around multi-valued fields and determining which entry actually
generated the "hit" for a particular query.
For example, let's say that I have a multi-valued field containing
people's names, associated with the document (trying to be non-specific
on purpose). In one document, I have the following names:
Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
search for Bob Smith, this document is returned. What I want to know is
that this document was returned because of "Bob Smith", not because of
Jane or Roger. I've tried using the highlighting settings. They do
provide some help, as the Jane Doe entry doesn't come back highlighted,
but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
that seems to pertain only to fields, not entries within a multi-valued
field.
Using Solr, is there a way to get the information I am looking for?
Specifically, that "Bob Smith" is the value in the multi-valued field
that triggered the hit?
-Todd Feak
Re: How to select *actual* match from a multi-valued field
Posted by Chris Hostetter <ho...@fucit.org>.
: At a high level, I'm trying to do some more intelligent searching using
: an app that will send multiple queries to Solr. My current issue is
: around multi-valued fields and determining which entry actually
: generated the "hit" for a particular query.
strictly speaking, this isn't possible with normal queries: the underlying
data structures do not maintain any history about why a doc matches when
executing a Query. SpanQuery is a subclass of Query that can give you this
information, so a custom Solr plugin that used SpanTermQueries and
SpanNearQueries in place of TermQueries and PhraseQueries could generate
this kind of informatio -- but it comes at a cost (SpanQueries are not as
fast as their traditional counter parts).
The best you can do is use things like score Explanations and hit
hihlighting which mimic the logic used during a query to determine why a
doc (already identified) matched.
: Jane Smith, Bob Smith, Roger Smith, Jane Doe. If the user performs a
: search for Bob Smith, this document is returned. What I want to know is
: that this document was returned because of "Bob Smith", not because of
: Jane or Roger. I've tried using the highlighting settings. They do
: provide some help, as the Jane Doe entry doesn't come back highlighted,
: but both Jane and Roger do. I've tried using hl.requireFieldMatch, but
: that seems to pertain only to fields, not entries within a multi-valued
: field.
FWIW: if you are using q=Bob+Smith then "Jane Smith" and "Roger Smith"
*are* contributing to the result.
However, even if you are using a phrase search (q="Bob+Smith") i do seem
to recall thatthe traditional highlighter highlights all of the terms in
the fields, even if the whole phrase isn't there -- historicly that was
considered a feature (for the purpose of snippet generation people
frequently want to see that type of behavior) but i can understand why it
would cause you problems in your current use case
As mention on the wiki, there is a "hl.usePhraseHighlighter" you can use
to trigger a newer "SpanScorer" based highlighter -- which takes advantage
of hte previously mentioned SpanQuery logic to determine what to
highlight (evne if the queries themselves weren't SpanQueries) ... this
param gets it's name because when dealing with phrase queries, it only
highlights them if the whole phrase is there.
http://wiki.apache.org/solr/HighlightingParameters
Compare the results of these two URLs when using the example
configs/data...
http://localhost:8983/solr/select/?hl.fragsize=0&hl.usePhraseHighlighter=false&df=features&q=%22Solr+Search%22&hl.snippets=1000&hl.requireFieldMatch=true&fl=features&hl=true&hl.fl=features
http://localhost:8983/solr/select/?hl.fragsize=0&hl.usePhraseHighlighter=true&df=features&q=%22Solr+Search%22&hl.snippets=1000&hl.requireFieldMatch=true&fl=features&hl=true&hl.fl=features
I think that may solve your particular problem.
-Hoss