You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Holland <ma...@zoopla.co.uk> on 2010/07/08 15:56:37 UTC

Determining matched tokens in original query

Hi,

I'm trying to find out which tokens in a user's query matched against each
result. I've been trying to use the highlight component for this, however it
doesn't quite fit the bill.

I'm using edismax, with mm set to 50%, and I want to extract for each
matching doc which tokens /didn't/ match (I then strip the matching tokens
from the search string and run the remaining query against a different solr
index).

My problem comes that the highlighter, naturally, applies highlighting to
fields after filters have been applied. This means it's tricky to use the
highlighted terms to match the original query because things like synonyms,
stemmed words & possessives may be matched.

E.g. with the search string:
mr banana's shop

I could get a highlighted fragment like:
<em>Mister</em> <em>Banana</em>'s frozen <em>banana</em> stand

Is there some other approach I could use?

Thanks,
Mark