You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by govind bhardwaj <go...@gmail.com> on 2011/08/04 08:33:36 UTC
Re: highlighting

Hi Sabeer,

I used Lucene 3.3.0 for testing your code. (I doubt that Lucene 4.0 has been
released as version 3.3.0 was released recently in July).

In the second case, due to exact-matching there is no output i.e. there is
no
"transport" (no exact match)  , but "transportation" in sourceText. One
could try
modifying the query to "transport*" like I did, but I got some error like
this :
*
MemoryIndex class-not-found error (Exception in thread "main"
java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex)*

Also, regarding highlighting and regular expression, I found this bug (i'm
not sure if this exactly relates to the problem you've asked)
http://exist.2174344.n4.nabble.com/exist-Bugs-3038780-match-highlighting-for-lucene-wildcard-and-regex-search-td2317647.html

Pretty much helpless after this :(

Govind

On Mon, Jul 18, 2011 at 4:50 PM, Sabeer Hussain <sh...@del.aithent.com>wrote:

> I am using Lucene 4.0 and trying to use its highlighting feature. I am not
> getting the desired result due to some mistake that I am not able to
> identify. My source code looks like
>
> String sourceText  = "liver disease kidney transplant";
> String termString ="\"transplant\"";
>
> SimpleAnalyzer simpleAnalyzer = new SimpleAnalyzer(Version.LUCENE_40);
> Query query = new QueryParser(Version.LUCENE_40,"contents",
> simpleAnalyzer).parse(termString);
>
> TokenStream tokenStream = simpleAnalyzer.tokenStream("contents", new
> StringReader(sourceText));
> QueryScorer scorer = new QueryScorer(query,"contents");
> scorer.setExpandMultiTermQuery(true);
> Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);
>
> SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter( "*",
> "*") ;
> Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer );
> highlighter.setTextFragmenter(fragmenter);
> highlighter.setMaxDocCharsToAnalyze(10000);
> String resultString =
> highlighter.getBestFragments(tokenStream,sourceText,1000, "...");
> System.out.println("Source Text1 = "+sourceText);
> System.out.println("Result Text1 = "+resultString);
>
> sourceText = "for liver transplantation.";
> tokenStream = simpleAnalyzer.tokenStream("contents", new
> StringReader(sourceText));
> resultString = highlighter.getBestFragments(tokenStream,sourceText,1000,
> "...");
>
> System.out.println("Source Text2 = "+sourceText);
> System.out.println("Result Text2 = "+resultString);
>
> For the first text, I am getting the result properly but not for the second
> one
>
> Source Text1 = liver disease kidney transplant
> Result Text1 = liver disease kidney *transplant*
>
> Source Text2 = for liver transplantation.
> Result Text2 =
>
> I am expecting the result for second one like
> for liver *transplant*ation
>
> or
> for liver *transplantation*
>
> What is wrong in my code?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/highlighting-tp542569p3178841.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
No trees were harmed in the creation of this message, but several thousand
electrons were mildly inconvenienced.