You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Renaud Waldura <re...@library.ucsf.edu> on 2007/03/02 20:01:44 UTC

RE: More Precise Highlighting

Hello Mark:

I apologize for not responding earlier, more urgent stuff took over. I
really appreciate your help with this. Since my first message was somewhat
terse, let me explain again.

My index includes both "regular" fields (stored, indexed, tokenized) and
"search-only" fields used for searching only (unstored, indexed, tokenized).
In particular, I have one such field called "metadata" that's the
accumulation of all metadata about the document (title, authors, date,
etc.). Using this field, I can easily query "all metadata" without searching
the document text (that's a separate field). It has no meaning other than
making this kind of query possible.

Say I query "+metadata:apple +banana". I would like to have the term "apple"
highlighted in all metadata fields; if Fiona Apple authored a document, her
lastname should be highlighted in the "author" field, but not in the
document text.

>From what I've experienced, when I pass NULL as fieldname to the
QueryTermExtractor, "apple" gets highlighted everywhere, both in any
metadata AND the document text. When I pass "author" as fieldname, it
doesn't highlight that field at all -- because the query was actually for
"metadata" not "author". It's not correct in either case, and I'm not sure
what to do. 

Thanks again for your help,

--Renaud

-----Original Message-----
From: markharw00d [mailto:markharw00d@yahoo.co.uk] 
Sent: Tuesday, February 13, 2007 3:24 PM
To: java-user@lucene.apache.org
Subject: Re: More Precise Highlighting

Not sure I fully understand the problem. The query is effectively
"allContent:someTitleText" and you want to highlight the string
"someTitleText" in the title field?
If you pass null as a fieldname to the QueryTermExtractor it will use all
term values, regardless of field, as string to highlight.
If you pass a fieldname it will only select highlight term values for that
field.
If you want, you can use QueryTermExtractor to extract just the "allContent"
field values and pass a TokenStream for the "title" field to the highlighter
and it would highlight the appropriate values in the title.

Do any of these options work?

Renaud Waldura wrote:
> The old highlighter code used to highlight found terms in any field 
> (too broad). The new highlighter lets one specify a field when 
> highlighting, but it highlights that field only (too narrow).
>  
> In my case we have an "all" field that is the concatenation of all 
> data about the document. When I highlight e.g. the "title" field, 
> nothing happens, because the Highlighter doesn't know the title is 
> included in this "all" field.
>  
> How can I tell the highlighter that my query fields can map to some 
> document fields? It looks like I'd change the QueryTermExtractor, but 
> it's conveniently all-static and final.
>  
> --Renaud
>  
>
>   

___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and
ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: More Precise Highlighting

Posted by Erick Erickson <er...@gmail.com>.

A copule of possibilities come to mind, whether they're suitable
or not.....

First, you could re-form your query to cover all the meta-data fields.
If I read your description correctly, you have, say, author, city, title
fields. You search over metadata but want to highlight
in author,  city and title for "apple". So you could re-form your
highlight query as author:apple city:apple title:apple (note
the implied or) when you highlight.

You could use MemoryIndex, which is intended for very fast
on-the-fly indexing of a presumed single document. Just add
the meta-data fields of your document to that, and go ahead and
highlight. Repeat if you wanted to highlight different hits in
your document text and return the results combined however
makes sense.

I must admit that my use of Highlighter is not the most experienced,
but I found MemoryIndex extremely usefule for solving highlighting
in my situation.

Best
Erick


On 3/2/07, Renaud Waldura <re...@library.ucsf.edu> wrote:
>
> Hello Mark:
>
> I apologize for not responding earlier, more urgent stuff took over. I
> really appreciate your help with this. Since my first message was somewhat
> terse, let me explain again.
>
> My index includes both "regular" fields (stored, indexed, tokenized) and
> "search-only" fields used for searching only (unstored, indexed,
> tokenized).
> In particular, I have one such field called "metadata" that's the
> accumulation of all metadata about the document (title, authors, date,
> etc.). Using this field, I can easily query "all metadata" without
> searching
> the document text (that's a separate field). It has no meaning other than
> making this kind of query possible.
>
> Say I query "+metadata:apple +banana". I would like to have the term
> "apple"
> highlighted in all metadata fields; if Fiona Apple authored a document,
> her
> lastname should be highlighted in the "author" field, but not in the
> document text.
>
> From what I've experienced, when I pass NULL as fieldname to the
> QueryTermExtractor, "apple" gets highlighted everywhere, both in any
> metadata AND the document text. When I pass "author" as fieldname, it
> doesn't highlight that field at all -- because the query was actually for
> "metadata" not "author". It's not correct in either case, and I'm not sure
> what to do.
>
> Thanks again for your help,
>
> --Renaud
>
>
>
>
> -----Original Message-----
> From: markharw00d [mailto:markharw00d@yahoo.co.uk]
> Sent: Tuesday, February 13, 2007 3:24 PM
> To: java-user@lucene.apache.org
> Subject: Re: More Precise Highlighting
>
> Not sure I fully understand the problem. The query is effectively
> "allContent:someTitleText" and you want to highlight the string
> "someTitleText" in the title field?
> If you pass null as a fieldname to the QueryTermExtractor it will use all
> term values, regardless of field, as string to highlight.
> If you pass a fieldname it will only select highlight term values for that
> field.
> If you want, you can use QueryTermExtractor to extract just the
> "allContent"
> field values and pass a TokenStream for the "title" field to the
> highlighter
> and it would highlight the appropriate values in the title.
>
> Do any of these options work?
>
>
>
> Renaud Waldura wrote:
> > The old highlighter code used to highlight found terms in any field
> > (too broad). The new highlighter lets one specify a field when
> > highlighting, but it highlights that field only (too narrow).
> >
> > In my case we have an "all" field that is the concatenation of all
> > data about the document. When I highlight e.g. the "title" field,
> > nothing happens, because the Highlighter doesn't know the title is
> > included in this "all" field.
> >
> > How can I tell the highlighter that my query fields can map to some
> > document fields? It looks like I'd change the QueryTermExtractor, but
> > it's conveniently all-static and final.
> >
> > --Renaud
> >
> >
> >
>
>
>
>
>
>
> ___________________________________________________________
> All new Yahoo! Mail "The new Interface is stunning in its simplicity and
> ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>