You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rick Leir <ri...@canadiana.ca> on 2015/12/02 16:09:04 UTC

Re: highlight

For performance, if you have many large documents, you want to index the
whole document but only store some identifiers. (Maybe this is not a
consideration for you, stop reading now )

If you are not storing the whole document, then Solr cannot do the
highlighting.  You would get an id, then locate your source document (maybe
in your filesystem) and do highlighting yourself.

> Can anyone offer any solutions for searching large documents and
returning a
> single phrase highlight?

Re: highlight

Posted by Teague James <te...@insystechinc.com>.
Hello,

Thanks for replying! Yes, I am storing the whole document. The document is indexed with a unique id. There are only 3 fields in the schema - id, rawDocument, tikaDocument. Search uses the tikaDocument field. Against this I am throwing 2-5 word phrases and getting highlighting matches to each individual word in the phrases instead of just the phrase. The highlighted text that is matched is read by another application for display in the front end UI. Right now my app has logic to figure out that multiple highlights indicate a phrase, but it isn't perfect. 

In this case Solr is reporting a single 3 word phrase as 2 hits one with 2 of the phrase words, the other with 1 of the phrase words. This only happens in large documents where the multi word phrase appears across the boundary of one of the document fragments that Solr in analyzing (this is a hunch - I really don't know the mechanics for certain, but the next statement makes evident how I came to this conclusion). However if I make a one sentence document with the same multi word phrase, Solr will report 1 hit with all three words individually highlighted. At the very least I know Solr is getting the phrase correct. It is the method of highlighting (I'm trying to get one set of tags per phrase) and the occasional breaking of a single phrase into 2 hits.

Given that setup, what do you recommend? I'm not sure I understand the approach you're describing. I appreciate the help!

-Teague James

> On Dec 2, 2015, at 10:09 AM, Rick Leir <ri...@canadiana.ca> wrote:
> 
> For performance, if you have many large documents, you want to index the
> whole document but only store some identifiers. (Maybe this is not a
> consideration for you, stop reading now )
> 
> If you are not storing the whole document, then Solr cannot do the
> highlighting.  You would get an id, then locate your source document (maybe
> in your filesystem) and do highlighting yourself.
> 
>> Can anyone offer any solutions for searching large documents and
> returning a
>> single phrase highlight?