You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Max Lynch <ih...@gmail.com> on 2010/07/29 00:53:53 UTC

Know which terms are in a document

I would like to be search against my index, and then *know* which of a set
of given terms were found in each document.

For example, let's say I want to show articles with the word "pizza" or
"cake" in them, but would like to be able to say which of those two was
found.  I might use this to handle the article differently if it is about
pizza, or if it is about cake.  I understand I can do multiple queries but I
would like to avoid that.

One thought I had was to use a highlighter and only return a fragment with
the highlighted word, but I'm not sure how to do this with the various
highlighting options.

Is there a way?

Thanks.

Re: Know which terms are in a document

Posted by Max Lynch <ih...@gmail.com>.
Yea, I've had mild success with the highlighting approach with lucene, but
wasn't sure if there was another method available from solr.

Thanks Mike.

On Thu, Jul 29, 2010 at 5:17 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> This is a fairly frequently requested and missing feature in Lucene/Solr...
>
> Lucene actually "knows" this information while it's scoring each
> document; it's just that it in no way tries to record that.
>
> If you will only do this on a few documents (eg the one page of
> results) then piggybacking on the highlighter is an OK approach.
>
> If you need it on more docs than that, then probably you should
> customize how your queries are scored to also tally up which docs had
> which terms.
>
> Mike
>
> On Wed, Jul 28, 2010 at 6:53 PM, Max Lynch <ih...@gmail.com> wrote:
> > I would like to be search against my index, and then *know* which of a
> set
> > of given terms were found in each document.
> >
> > For example, let's say I want to show articles with the word "pizza" or
> > "cake" in them, but would like to be able to say which of those two was
> > found.  I might use this to handle the article differently if it is about
> > pizza, or if it is about cake.  I understand I can do multiple queries
> but I
> > would like to avoid that.
> >
> > One thought I had was to use a highlighter and only return a fragment
> with
> > the highlighted word, but I'm not sure how to do this with the various
> > highlighting options.
> >
> > Is there a way?
> >
> > Thanks.
> >
>

Re: Know which terms are in a document

Posted by Michael McCandless <lu...@mikemccandless.com>.
This is a fairly frequently requested and missing feature in Lucene/Solr...

Lucene actually "knows" this information while it's scoring each
document; it's just that it in no way tries to record that.

If you will only do this on a few documents (eg the one page of
results) then piggybacking on the highlighter is an OK approach.

If you need it on more docs than that, then probably you should
customize how your queries are scored to also tally up which docs had
which terms.

Mike

On Wed, Jul 28, 2010 at 6:53 PM, Max Lynch <ih...@gmail.com> wrote:
> I would like to be search against my index, and then *know* which of a set
> of given terms were found in each document.
>
> For example, let's say I want to show articles with the word "pizza" or
> "cake" in them, but would like to be able to say which of those two was
> found.  I might use this to handle the article differently if it is about
> pizza, or if it is about cake.  I understand I can do multiple queries but I
> would like to avoid that.
>
> One thought I had was to use a highlighter and only return a fragment with
> the highlighted word, but I'm not sure how to do this with the various
> highlighting options.
>
> Is there a way?
>
> Thanks.
>