You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2018/03/24 17:58:00 UTC
[jira] [Created] (SOLR-12136) Document hl.q parameter

Erick Erickson created SOLR-12136:
-------------------------------------

             Summary: Document hl.q parameter
                 Key: SOLR-12136
                 URL: https://issues.apache.org/jira/browse/SOLR-12136
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: documentation
            Reporter: Erick Erickson
            Assignee: Erick Erickson


*********Original issue:
If I specify:

hl.fl=f1&hl.q=something

then "something" is analyzed against the default field rather than f1

So in this particular case, f1 did some diacritic folding
(GermanNormalizationFilterFactory specifically). But my guess is that
the df was still "text", or at least something that didn't reference
that filter.

I'm defining "worked" in what follows is getting highlighting on "Kündigung"

so
Kündigung was indexed as Kundigung

So far so good. Now if I try to highlight on f1

These work
q=f1:Kündigung&hl.fl=f1
q=f1:Kündigung&hl.fl=f1&hl.q=Kundigung <= NOTE, without umlaut
q=f1:Kündigung&hl.fl=f1&hl.q=f1:Kündigung <= NOTE, with umlaut

This does not work
q=f1:Kündigung&hl.fl=f1&hl.q=Kündigung <= NOTE, with umlaut

Testing this locally, I'd get the highlighting if I defined df as "f1"
in all the above cases.

**********David Smiley's analysis
BTW hl.q is parsed by the hl.qparser param which defaults to the defType param which defaults to "lucene".

In common cases, I think this is a non-issue.  One common case is defType=edismax and you specify a list of fields in 'qf' (thus your query has parts parsed on various fields) and then you set hl.fl to some subset of those fields.  This will use the correct analysis.

You make a compelling point in terms of what a user might expect -- my gut reaction aligned with your expectation and I thought maybe we should change this.  But it's not as easy at it seems at first blush, and there are bad performance implications.  How do you *generically* tell an arbitrary query parser which field it should parse the string with?  We have no such standard.  And lets say we did; then we'd have to re-parse the query string for each field in hl.fl (and consider hl.fl might be a wildcard!).  Perhaps both solveable or constrainable with yet more parameters, but I'm pessimistic it'll be a better outcome.

The documentation ought to clarify this matter.  Probably in hl.fl to say that the fields listed are analyzed with that of their field type, and that it ought to be "compatible" (the same or similar) to that which parsed the query.

Perhaps, like spellcheck's spellcheck.collateParam.* param prefix, highlighting could add a means to specify additional parameters for hl.q to be parsed (not just the choice of query parsers).  This isn't particularly pressing though since this can easily be added to the front of hl.q like hl.q={!edismax qf=$hl.fl v=$q}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org