You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David '-1' Schmid <gd...@gmail.com> on 2019/02/19 07:58:19 UTC
How to suggest prefix matches over all tokens of a field (was Re: Suggest Component, prefix match (sur-)name)
On 2019-02-18T18:12:44, David '-1' Schmid wrote:
> Will report back if that's working out.
It's working!
If anybody want's to replicate, here's what I ended up with.
.. managed-schema:
.
. <!-- the field(Type) where the original vallues are stored in -->
. <fieldType name="important_strings" class="solr.StrField"
. sortMissingLast="true" docValues="true" indexed="true" stored="true"
. multiValued="true"/>
. <field name="author" type="important_strings"/>
.
. <!-- lower case tokenizatrion for case insensitive matches -->
. <fieldType name="text_lower" class="solr.TextField" multiValued="true" positionIncrementGap="100">
. <analyzer>
. <tokenizer class="solr.WhitespaceTokenizerFactory"/>
. <filter class="solr.LowerCaseFilterFactory"/>
. </analyzer>
. </fieldType>
. <field name="author_lower" type="text_lower"/>
. <copyField source="author" dest="author_lower"/>
.
. <!-- as above but with added edgeNGrams -->
. <fieldType name="text_prefix" class="solr.TextField" multiValued="true" positionIncrementGap="100">
. <analyzer type="index">
. <tokenizer class="solr.WhitespaceTokenizerFactory"/>
. <filter class="solr.LowerCaseFilterFactory"/>
. <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
. </analyzer>
. <analyzer type="query">
. <tokenizer class="solr.WhitespaceTokenizerFactory"/>
. <filter class="solr.LowerCaseFilterFactory"/>
. </analyzer>
. </fieldType>
. <field name="author_ngram" type="text_prefix"/>
. <copyField source="author" dest="author_ngram"/>
.
The requestHandler uses the three fields above to provide suggestions.
.. solrconfig.xml:
.
. <requestHandler class="solr.SearchHandler" name="/suggest_author">
. <lst name="defaults">
. <str name="defType">edismax</str>
. <str name="rows">10</str>
. <str name="fl">author</str>
. <str name="qf">author_lower^10 author_ngram</str>
. </lst>
. </requestHandler>
.
In case a token will match the name (or surname) of an author completely
it will boost the complete match over the partial match from
author_ngram:
Let's say I want to find "Hauck" and get a result for the first four chars.
.. curl http://localhost:8983/solr/dblp/suggest_author?q=hauc
. "docs": [
. {
. "author": [
. "Gregor Hauc"
. ]
. },
. {
. "author": [
. "Andrej Kovacic",
. "Gregor Hauc",
. "Brina Buh",
. "Mojca Indihar Stemberger"
. ]
. },
. {
. "author": [
. "Franz J. Hauck",
. "Franz Johannes Hauck"
. ]
. },
. /* ... */
. ]
once I get the last character in, it will boost complete over partial
matches:
.. curl http://localhost:8983/solr/dblp/suggest_author?q=hauck
. "docs": [
. {
. "author": [
. "Rainer Hauck"
. ]
. },
. {
. "author": [
. "Julia Hauck"
. ]
. },
. {
. "author": [
. "Bernd Hauck"
. ]
. },
. /* ... */
. ]
As these are not the persons I were looking for, I start typing the
first name:
.. curl 'http://localhost:8983/solr/dblp/suggest_author?q=hauck%20fra'
. "docs": [
. {
. "author": [
. "Fra Angelico Viray"
. ]
. },
. {
. "author": [
. "Alberto Del Fra"
. ]
. },
. {
. "author": [
. "Alberto Del Fra"
. ]
. },
. /* ... */
. ]
ohno, now my previous match was replaced by some other match.
This can be curcumvented by adding "q.op=AND" to enforce both:
.. curl 'http://localhost:8983/solr/dblp/suggest_author?q.op=AND&q=hauck%20fra'
. "docs": [
. {
. "author": [
. "Franz J. Hauck",
. "Franz Johannes Hauck"
. ]
. },
. /* ... */
. ]
Which achieves what I wanted, really.
q.op can be set in solrconfig to always use AND.
Adding hl=true to the query will provide highlighting:
.. curl 'http://localhost:8983/solr/dblp/suggest_author?q.op=AND&q=hauck%20fra&hl=true'
. "highlighting": {
. "homepages/h/FranzJHauck": {
. "author_lower": [
. "Franz J. <em>Hauck</em>"
. ],
. "author_ngram": [
. "<em>Franz</em> J. <em>Hauck</em>"
. ]
. },
I'm pretty happy with this :D
The original idea came from the book "Solr in Action" by: Trey Grainger
and Timothy Potter. It's from 2014 (builds on solr 4.7), and might need some
adaptions :D
regards,
-1