You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ritu choudhary <ri...@gmail.com> on 2009/05/27 07:56:09 UTC

highlighting searched results in document

hi there,
    I am using lucene highlighter to highlight the searched result
but it shows only the query string in bold highlights.
IS THERE ANY WAY I CAN USE IT TO SHOW THE HIGHLIGHTED TEXT IN THE
DOCUMENT WHERE IT IS FOUND?
 I need to show the searched terms in highlights in the
document where it is found and i want to do it without using
org.apache.lucene.search.Hits
Please help. Thanks in advance.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: highlighting searched results in document

Posted by Ritu choudhary <ri...@gmail.com>.
I want to confirm the output of the below statement , what i get into
"result" is just the word i am searching (let's say d word is
registered). How can i get the whole fragment in which the word is
found and show the highlighted word in that fragment or document.

String result =
       highlighter.getBestFragments(tokenStream, text, 5, "...");
   System.out.println("result:" + result);

On 27/05/2009, KK <di...@gmail.com> wrote:
> Hi ,
> AFAIK, the default option is to bold the matched text. If you want to do
> something else, say highlight it with some color then you have to do that
> instead of doing the default bolding.
> The following is a working example from LIA2ndEdn, [verbatim copy] for hit
> highlighting.
>
> import java.io.*;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.search.PhraseQuery;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.SpanScorer;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.Scorer;
> import org.apache.lucene.search.highlight.SimpleFragmenter;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.Fragmenter;
>
> public class HighlightIt {
>   private static final String text =
>       "Contrary to popular belief, Lorem Ipsum is" +
>       " not simply random text. It has roots in a piece of" +
>       " classical Latin literature from 45 BC, making it over" +
>       " 2000 years old. Richard McClintock, a Latin professor" +
>       " at Hampden-Sydney College in Virginia, looked up one" +
>       " of the more obscure Latin words, consectetur, from" +
>       " a Lorem Ipsum passage, and going through the cites" +
>       " of the word in classical literature, discovered the" +
>       " undoubtable source. Lorem Ipsum comes from sections" +
>       " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
>       " Malorum\" (The Extremes of Good and Evil) by Cicero," +
>       " written in 45 BC. This book is a treatise on the" +
>       " theory of ethics, very popular during the" +
>       " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
>       " ipsum dolor sit amet..\", comes from a line in" +
>       " section 1.10.32."; // from http://www.lipsum.com/
>
>   public static void main(String[] args) throws IOException {
>     String filename = args[0];
>     if (filename == null) {
>       System.err.println("Usage: HighlightIt <filename>");
>       System.exit(-1);
>     }
>     //TermQuery query = new TermQuery(new Term("f", "literature"));
>     PhraseQuery phrase = new PhraseQuery();
>     phrase.add(new Term("f", "lorem"));
>     phrase.add(new Term("f", "ipsum"));
>     phrase.add(new Term("f", "passage"));
>     phrase.setSlop(0);
>
>     QueryScorer scorer = new QueryScorer(phrase);
>
>     SimpleHTMLFormatter formatter =
>         new SimpleHTMLFormatter("<span class=\"highlight\">",
>             "</span>");
>     Highlighter highlighter = new Highlighter(formatter, scorer);
>
>     Fragmenter fragmenter = new SimpleFragmenter(50);
>
>     highlighter.setTextFragmenter(fragmenter);
>
>     TokenStream tokenStream = new StandardAnalyzer()
>         .tokenStream("f", new StringReader(text));
>
>     String result =
>         highlighter.getBestFragments(tokenStream, text, 5, "...");
>     System.out.println("result:" + result);
>
>     //@Ritu, remove the following chunk for your requirement
>
>     FileWriter writer = new FileWriter(filename);
>     writer.write("<html>");
>     writer.write("<style>\n" +
>
>         ".highlight {\n" +
>
>         " background: yellow;\n" +
>         "}\n" +
>         "</style>");
>     writer.write("<body>");
>     writer.write(result);
>     writer.write("</body></html>");
>     writer.close();
>   // remove upto this point
>   }
> }
> --------------
> Make sure you have all the lucene jars in your classpath. As you can see in
> the last part of the code the final output is being written to a file. As
> per your requirement remove that code as well as the part that adds html and
> style tags.
> Now the code adds the highllight span whereeve there is a match. So now
> we've to put the style script in the html page that you are using to see the
> results from browser add the same thing withing <script> </script> tags like
> this
> <script>
> <style>
> .highlight {
> background: yellow
> }
> </style>
> </script>
>
> I hope it will work . If you still have some problems post that.
>
> HTH,
> KK
>
> On Wed, May 27, 2009 at 11:26 AM, Ritu choudhary
> <ri...@gmail.com>wrote:
>
>> hi there,
>>    I am using lucene highlighter to highlight the searched result
>> but it shows only the query string in bold highlights.
>> IS THERE ANY WAY I CAN USE IT TO SHOW THE HIGHLIGHTED TEXT IN THE
>> DOCUMENT WHERE IT IS FOUND?
>>  I need to show the searched terms in highlights in the
>> document where it is found and i want to do it without using
>> org.apache.lucene.search.Hits
>> Please help. Thanks in advance.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: highlighting searched results in document

Posted by KK <di...@gmail.com>.
Hi ,
AFAIK, the default option is to bold the matched text. If you want to do
something else, say highlight it with some color then you have to do that
instead of doing the default bolding.
The following is a working example from LIA2ndEdn, [verbatim copy] for hit
highlighting.

import java.io.*;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.SpanScorer;
import org.apache.lucene.index.Term;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.search.highlight.SimpleFragmenter;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.Fragmenter;

public class HighlightIt {
  private static final String text =
      "Contrary to popular belief, Lorem Ipsum is" +
      " not simply random text. It has roots in a piece of" +
      " classical Latin literature from 45 BC, making it over" +
      " 2000 years old. Richard McClintock, a Latin professor" +
      " at Hampden-Sydney College in Virginia, looked up one" +
      " of the more obscure Latin words, consectetur, from" +
      " a Lorem Ipsum passage, and going through the cites" +
      " of the word in classical literature, discovered the" +
      " undoubtable source. Lorem Ipsum comes from sections" +
      " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" +
      " Malorum\" (The Extremes of Good and Evil) by Cicero," +
      " written in 45 BC. This book is a treatise on the" +
      " theory of ethics, very popular during the" +
      " Renaissance. The first line of Lorem Ipsum, \"Lorem" +
      " ipsum dolor sit amet..\", comes from a line in" +
      " section 1.10.32."; // from http://www.lipsum.com/

  public static void main(String[] args) throws IOException {
    String filename = args[0];
    if (filename == null) {
      System.err.println("Usage: HighlightIt <filename>");
      System.exit(-1);
    }
    //TermQuery query = new TermQuery(new Term("f", "literature"));
    PhraseQuery phrase = new PhraseQuery();
    phrase.add(new Term("f", "lorem"));
    phrase.add(new Term("f", "ipsum"));
    phrase.add(new Term("f", "passage"));
    phrase.setSlop(0);

    QueryScorer scorer = new QueryScorer(phrase);

    SimpleHTMLFormatter formatter =
        new SimpleHTMLFormatter("<span class=\"highlight\">",
            "</span>");
    Highlighter highlighter = new Highlighter(formatter, scorer);

    Fragmenter fragmenter = new SimpleFragmenter(50);

    highlighter.setTextFragmenter(fragmenter);

    TokenStream tokenStream = new StandardAnalyzer()
        .tokenStream("f", new StringReader(text));

    String result =
        highlighter.getBestFragments(tokenStream, text, 5, "...");
    System.out.println("result:" + result);

    //@Ritu, remove the following chunk for your requirement

    FileWriter writer = new FileWriter(filename);
    writer.write("<html>");
    writer.write("<style>\n" +

        ".highlight {\n" +

        " background: yellow;\n" +
        "}\n" +
        "</style>");
    writer.write("<body>");
    writer.write(result);
    writer.write("</body></html>");
    writer.close();
  // remove upto this point
  }
}
--------------
Make sure you have all the lucene jars in your classpath. As you can see in
the last part of the code the final output is being written to a file. As
per your requirement remove that code as well as the part that adds html and
style tags.
Now the code adds the highllight span whereeve there is a match. So now
we've to put the style script in the html page that you are using to see the
results from browser add the same thing withing <script> </script> tags like
this
<script>
<style>
.highlight {
background: yellow
}
</style>
</script>

I hope it will work . If you still have some problems post that.

HTH,
KK

On Wed, May 27, 2009 at 11:26 AM, Ritu choudhary <ri...@gmail.com>wrote:

> hi there,
>    I am using lucene highlighter to highlight the searched result
> but it shows only the query string in bold highlights.
> IS THERE ANY WAY I CAN USE IT TO SHOW THE HIGHLIGHTED TEXT IN THE
> DOCUMENT WHERE IT IS FOUND?
>  I need to show the searched terms in highlights in the
> document where it is found and i want to do it without using
> org.apache.lucene.search.Hits
> Please help. Thanks in advance.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>