You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matthew Gwynne <ma...@gmail.com> on 2014/11/20 19:49:13 UTC

Can SOLR map query terms one-to-one with matched terms?

Hi,

 I am currently working on a people search tool using SOLR to facilitate
the indexing + fuzzy search across multiple fields (with edismax), using
various filters such as SynonymFilterFactory, WordDelimiterFactory etc and
disabling TF-IDF.

This works very well, except for a few cases where a search term is matched
multiple times. For example, searching for "Martin XXXX" returns "Marvin
Martin" as the highest result because it matches Martin against both
"Marvin" and "Martin".

Matching a search term against multiple words in a document, in general,
makes a lot of sense. However, in the case of people search, I'd like it to
only add the maximum score for each search term (i.e., map each search term
to only one word in the document (person's name / information)).

Is there a mechanism in SOLR/Lucene which would allow me to force a
one-to-one mapping between search term and matched term?

You can see the issue below in the debug for the query:

    0.27641854 = (MATCH) sum of:
      0.27641854 = (MATCH) sum of:
        0.15077375 = (MATCH) weight(FullName:martin in 118169)
[NoTFIDFSimilarityClass], result of:
          0.15077375 = score(doc=118169,freq=1.0 = termFreq=1.0
    ), product of:
            0.15077375 = queryWeight, product of:
              1.0 = idf(docFreq=1619, maxDocs=328317)
              0.15077375 = queryNorm
            1.0 = fieldWeight in 118169, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              1.0 = idf(docFreq=1619, maxDocs=328317)
              1.0 = fieldNorm(doc=118169)
        0.12564479 = (MATCH) weight(FullName:marvin^0.8333333 in 118169)
[NoTFIDFSimilarityClass], result of:
          0.12564479 = score(doc=118169,freq=1.0 = termFreq=1.0
    ), product of:
            0.12564479 = queryWeight, product of:
              0.8333333 = boost
              1.0 = idf(docFreq=105, maxDocs=328317)
              0.15077375 = queryNorm
            1.0 = fieldWeight in 118169, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              1.0 = idf(docFreq=105, maxDocs=328317)
              1.0 = fieldNorm(doc=118169)

The query is e.g.,

http://domain/solr/peoplefinder/select?q=Martin~&wt=json&indent=true&defType=edismax&qf=FullName&stopwords=true&lowercaseOperators=true&debug=true

In the spirit of disclosure, I've also asked regarding this on
StackOverflow as well (see
http://stackoverflow.com/questions/26925811/can-solr-map-query-terms-one-to-one-with-matched-terms),
so if anyone wants some upvotes, feel free to post the answer there as well
:).

Thanks in advance,

Matthew Gwynne