You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Imbeault <mi...@sympatico.ca> on 2006/09/16 21:04:27 UTC

Better Highligther fragmenter?

I'm now using the excellent Hightlighter from within Solr and it works 
very well; except that the generated fragments sometimes begins with 
bad-looking characters (the "." of the end of the previous phrase, or a 
), /10, etc). The same is true for the fragments ends. I looked at both 
the dev and user lucene list in search for a better Fragmenter class, 
but it seems that there's none right now (just the simple and null 
fragmenters).

To me the 'simple' fragmenter is a bit too simple; anyone had success in 
implementing a more intelligent one? I have no java coding experience, 
sadly, so I don't know where to begin on this one. I don't think fancy 
phrase recognition is needed; just a better boundary algorithm (avoid 
beginning / ending fragments with bad looking characters) and the 
addition of "..." at the end and beginning of the fragment if 
fragmentation of a phrase took place.

Also, is it required that the highlighted field is 'stored'? I'm pretty 
sure it is, but just want confirmation.

Thanks,

-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org