You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Renaud Waldura <re...@library.ucsf.edu> on 2007/03/29 20:28:16 UTC

RE: More Precise Highlighting (MY SOLUTION)

After much head-scratching and re-reading of posts around this issue, I
found a solution by writing my own QueryTermExtractor. Thanks for the help
everyone. 

To recap, I wanted to get more "precise" highlighting by having just the
"right" fields highlighted. My original example was a field named "metadata"
which is the concatenation of some metadata about the document (author,
title, etc.) This field is indexed and not stored. Users are able to run
queries such as "metadata:myauthor", and expect the "title" field to be
highlighted in the search results when it contains "myauthor".

With the stock highlighter, I had 2 choices: a- highlight all occurrences of
"myauthor" in all the fields (not good) or b- highlight occurrences of
"myauthor" in just the "metadata" field. This field doesn't exist per se in
the document, and no highlighting was produced at all. Not good either.

Basically I needed to map "index-only" fields (indexed, not stored) to the
actual (stored) fields that make up this field. Mark gave my the key by
mentioning the QueryTermExtractor just produces a list of strings. 

I wrote a custom QueryTermExtractor (copy, paste, hack) that has knowledge
of my indexing schema and knows how to map index-only fields to actual
fields, therefore generating the right set of WeightedTerm[] passed to the
QueryScorer constructor. 

I hope this will make sense to the (future) reader who ran into this problem
also.

--Renaud

 

-----Original Message-----
From: mark harwood [mailto:markharw00d@yahoo.co.uk] 
Sent: Monday, March 05, 2007 2:03 AM
To: java-user@lucene.apache.org
Subject: Re: More Precise Highlighting

I think the solution is fairly simple.
Pass the "metadata" fieldname to the QueryTermExtractor - not the fieldname
"author". QueryTermExtractor effectively provides just a list of strings (no
fieldnames) which are then matched against strings found in the tokenStream
which represents your content.
Because your query did not contain any "author:apple" clauses
QueryTermExtractor produced no strings for highlighting when you asked for
the field.

Cheers
Mark 



On 3/2/07, Renaud Waldura <re...@library.ucsf.edu> wrote:
>
> Hello Mark:
>
> I apologize for not responding earlier, more urgent stuff took over. I 
> really appreciate your help with this. Since my first message was 
> somewhat terse, let me explain again.
>
> My index includes both "regular" fields (stored, indexed, tokenized) 
> and "search-only" fields used for searching only (unstored, indexed, 
> tokenized).
> In particular, I have one such field called "metadata" that's the 
> accumulation of all metadata about the document (title, authors, date, 
> etc.). Using this field, I can easily query "all metadata" without 
> searching the document text (that's a separate field). It has no 
> meaning other than making this kind of query possible.
>
> Say I query "+metadata:apple +banana". I would like to have the term 
> "apple"
> highlighted in all metadata fields; if Fiona Apple authored a 
> document, her lastname should be highlighted in the "author" field, 
> but not in the document text.
>
> From what I've experienced, when I pass NULL as fieldname to the 
> QueryTermExtractor, "apple" gets highlighted everywhere, both in any 
> metadata AND the document text. When I pass "author" as fieldname, it 
> doesn't highlight that field at all -- because the query was actually 
> for "metadata" not "author". It's not correct in either case, and I'm 
> not sure what to do.
>
> Thanks again for your help,
>
> --Renaud
>
>
>
>
> -----Original Message-----
> From: markharw00d [mailto:markharw00d@yahoo.co.uk]
> Sent: Tuesday, February 13, 2007 3:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: More Precise Highlighting
>
> Not sure I fully understand the problem. The query is effectively 
> "allContent:someTitleText" and you want to highlight the string 
> "someTitleText" in the title field?
> If you pass null as a fieldname to the QueryTermExtractor it will use 
> all term values, regardless of field, as string to highlight.
> If you pass a fieldname it will only select highlight term values for 
> that field.
> If you want, you can use QueryTermExtractor to extract just the 
> "allContent"
> field values and pass a TokenStream for the "title" field to the 
> highlighter and it would highlight the appropriate values in the 
> title.
>
> Do any of these options work?
>
>
>
> Renaud Waldura wrote:
> > The old highlighter code used to highlight found terms in any field 
> > (too broad). The new highlighter lets one specify a field when 
> > highlighting, but it highlights that field only (too narrow).
> >
> > In my case we have an "all" field that is the concatenation of all 
> > data about the document. When I highlight e.g. the "title" field, 
> > nothing happens, because the Highlighter doesn't know the title is 
> > included in this "all" field.
> >
> > How can I tell the highlighter that my query fields can map to some 
> > document fields? It looks like I'd change the QueryTermExtractor, 
> > but it's conveniently all-static and final.
> >
> > --Renaud
> >
> >
> >
>
>
>
>
>
>
> ___________________________________________________________
> All new Yahoo! Mail "The new Interface is stunning in its simplicity 
> and ease of use." - PC Magazine 
> http://uk.docs.yahoo.com/nowyoucan.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>





	
	
		
___________________________________________________________
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more
at the Yahoo! Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org