You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2013/02/28 17:28:02 UTC

Highlighting is attempted with q=field:*

Before I raise a JIRA, I thought I'd see what people think. Didn't see
anything like this on a quick search of the JIRAs:

A query like
q=*:*&hl=on&.....

doesn't attempt to highlight anything, as well it shouldn't. But
q=field1:*&hl=on&...

does try to highlight. Of course it highlights every last term in the
highlight fields, and is also very slow.

Re-forming the query as
q=*:*&fq=field1:*&hl=on&....
gets around the problem and is a better query anyway, but it still seems
like trying to highlight in the above case is wrong.

Worth a JIRA?

Erick

Re: Highlighting is attempted with q=field:*

Posted by Jack Krupansky <ja...@basetechnology.com>.
Could I subvert your “fix” by writing field1:* as field1:** or field1:?* ?

*:* is simply a shorthand for “MatchAllDocs”, with no implication that it is referencing any field values, while field1:* is an explicit wildcard query, so they are not really comparable other than at a superficial lexical level.

That said, somewhere there is a Jira that I filed that attempts to have * treated as a faster filter query for matching all docs that have any value (non-null) in a field. Your proposal makes more sense in that context since it is clear that * is semantically distinct from a true wildcard.

Back to my question above, I think it’s okay if only strict single-asterisk wildcard is covered by your change. Any other wildcard or fuzzy query would continue to behave as before – although adding my suggested limit on term expansion might still be worthwhile. And I might still argue that your fix should be an option even if the default is as you have suggested.

But, all these comments should be placed on a Jira!

-- Jack Krupansky

From: Erick Erickson 
Sent: Thursday, February 28, 2013 11:59 AM
To: dev@lucene.apache.org 
Subject: Re: Highlighting is attempted with q=field:*

I was mostly thinking of this specific case, but a more general solution makes sense. I can still argue that the case of field:* shouldn't ever try to highlight, but field:some* could, as you say, actually be useful.... 

Mostly I'm drawing attention to the difference between *:* and field:*. I think we should be consistent across both.

Erick



On Thu, Feb 28, 2013 at 11:41 AM, Jack Krupansky <ja...@basetechnology.com> wrote:

  If you want to add a highlight option to suppress or limit highlighting for wildcard terms (or any multi-term query, including fuzzy query), that would seem reasonable, but I’d hate to lose the highlighting for useful wildcards such as field1:invest*.

  Maybe if it was something like &hl.maxMultiTerms=15, that would provide the best of both worlds – a reasonable default to prevent really slow highlighting, but still give reasonable highlighting in reasonable cases, and give you the ultimate control to completely turn off all multi-term expansion highlighting if you so choose.

  -- Jack Krupansky

  From: Erick Erickson 
  Sent: Thursday, February 28, 2013 11:28 AM
  To: dev@lucene.apache.org 
  Subject: Highlighting is attempted with q=field:*

  Before I raise a JIRA, I thought I'd see what people think. Didn't see anything like this on a quick search of the JIRAs: 

  A query like 
  q=*:*&hl=on&.....

  doesn't attempt to highlight anything, as well it shouldn't. But 
  q=field1:*&hl=on&...

  does try to highlight. Of course it highlights every last term in the highlight fields, and is also very slow. 

  Re-forming the query as 
  q=*:*&fq=field1:*&hl=on&.... 
  gets around the problem and is a better query anyway, but it still seems like trying to highlight in the above case is wrong.

  Worth a JIRA?

  Erick

Re: Highlighting is attempted with q=field:*

Posted by Erick Erickson <er...@gmail.com>.
I was mostly thinking of this specific case, but a more general solution
makes sense. I can still argue that the case of field:* shouldn't ever try
to highlight, but field:some* could, as you say, actually be useful....

Mostly I'm drawing attention to the difference between *:* and field:*. I
think we should be consistent across both.

Erick


On Thu, Feb 28, 2013 at 11:41 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

>   If you want to add a highlight option to suppress or limit highlighting
> for wildcard terms (or any multi-term query, including fuzzy query), that
> would seem reasonable, but I’d hate to lose the highlighting for useful
> wildcards such as field1:invest*.
>
> Maybe if it was something like &hl.maxMultiTerms=15, that would provide
> the best of both worlds – a reasonable default to prevent really slow
> highlighting, but still give reasonable highlighting in reasonable cases,
> and give you the ultimate control to completely turn off all multi-term
> expansion highlighting if you so choose.
>
> -- Jack Krupansky
>
>  *From:* Erick Erickson <er...@gmail.com>
> *Sent:* Thursday, February 28, 2013 11:28 AM
> *To:* dev@lucene.apache.org
> *Subject:* Highlighting is attempted with q=field:*
>
>  Before I raise a JIRA, I thought I'd see what people think. Didn't see
> anything like this on a quick search of the JIRAs:
>
> A query like
> q=*:*&hl=on&.....
>
> doesn't attempt to highlight anything, as well it shouldn't. But
> q=field1:*&hl=on&...
>
> does try to highlight. Of course it highlights every last term in the
> highlight fields, and is also very slow.
>
> Re-forming the query as
> q=*:*&fq=field1:*&hl=on&....
> gets around the problem and is a better query anyway, but it still seems
> like trying to highlight in the above case is wrong.
>
> Worth a JIRA?
>
> Erick
>

Re: Highlighting is attempted with q=field:*

Posted by Jack Krupansky <ja...@basetechnology.com>.
If you want to add a highlight option to suppress or limit highlighting for wildcard terms (or any multi-term query, including fuzzy query), that would seem reasonable, but I’d hate to lose the highlighting for useful wildcards such as field1:invest*.

Maybe if it was something like &hl.maxMultiTerms=15, that would provide the best of both worlds – a reasonable default to prevent really slow highlighting, but still give reasonable highlighting in reasonable cases, and give you the ultimate control to completely turn off all multi-term expansion highlighting if you so choose.

-- Jack Krupansky

From: Erick Erickson 
Sent: Thursday, February 28, 2013 11:28 AM
To: dev@lucene.apache.org 
Subject: Highlighting is attempted with q=field:*

Before I raise a JIRA, I thought I'd see what people think. Didn't see anything like this on a quick search of the JIRAs: 

A query like 
q=*:*&hl=on&.....

doesn't attempt to highlight anything, as well it shouldn't. But 
q=field1:*&hl=on&...

does try to highlight. Of course it highlights every last term in the highlight fields, and is also very slow. 

Re-forming the query as 
q=*:*&fq=field1:*&hl=on&.... 
gets around the problem and is a better query anyway, but it still seems like trying to highlight in the above case is wrong.

Worth a JIRA?

Erick