You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by syedfa <fa...@gmail.com> on 2008/05/03 21:41:31 UTC

Using Highlighter to return search results

Dear Fellow Java/Lucene developers:

I have created an index from an xml document which I would like to search
using lucene.  Unfortunately, I am only able to return the number of times a
particular key word is found in the document, instead of returning the
keyword as well as the text that comes before and after it in the xml
document. 

For example:

If I am searching for the keyword "arrows" in the following xml document
(which is from Shakespeare's play, "Hamlet"):

<SPEECH>
<SPEAKER>HAMLET</SPEAKER>
<LINES>To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them? To die: to sleep;
No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
That flesh is heir to, 'tis a consummation
Devoutly to be wish'd. To die, to sleep;
To sleep: perchance to dream: ay, there's the rub;
For in that sleep of death what dreams may come
When we have shuffled off this mortal coil,
Must give us pause: there's the respect
That makes calamity of so long life;
For who would bear the whips and scorns of time,
The oppressor's wrong, the proud man's contumely,
The pangs of despised love, the law's delay,
The insolence of office and the spurns
That patient merit of the unworthy takes,
When he himself might his quietus make
With a bare bodkin? who would fardels bear,
To grunt and sweat under a weary life,
But that the dread of something after death,
The undiscover'd country from whose bourn
No traveller returns, puzzles the will
And makes us rather bear those ills we have
Than fly to others that we know not of?
Thus conscience does make cowards of us all;
And thus the native hue of resolution
Is sicklied o'er with the pale cast of thought,
And enterprises of great pith and moment
With this regard their currents turn awry,
And lose the name of action.--Soft you now!
The fair Ophelia! Nymph, in thy orisons
Be all my sins remember'd.</LINES>
</SPEECH>

I would like a set of results that return the word "arrows" highlighted, and
the line from which it appears (i.e. The slings and arrows of outrageous
fortune).  Below is the code that I am using to search:

import java.io.File;
import java.io.FileReader;
import java.io.Reader;
import java.util.Date;
import java.io.IOException;
 
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer ;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query ;
import org.apache.lucene.search.Hits;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.Directory;
import org.apache.lucene.queryParser.QueryParser;
 
public class Searcher {
    
    /** Creates a new instance of Searcher */
    
    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception{ 
        
        File indexDir=new File("c:\\indexD");
        String q="arrows";
        
        if(!indexDir.exists() || !indexDir.isDirectory()){
            throw new Exception(indexDir + "does not exist of is not a
directory."); 
        }
        
        search(indexDir, q);
        
    }
    
    public static void search(File indexDir, String q) throws Exception {
         
        Directory fsDir=FSDirectory.getDirectory(indexDir);
        IndexSearcher is=new IndexSearcher(fsDir);
        
        Query parser=new QueryParser("LINES", new
StandardAnalyzer()).parse(q); 
        long start=new Date().getTime();
        Hits hits=is.search(parser);
        long end=new Date().getTime();
        
        System.err.println("Found " + hits.length() + " document(s)(in" +
(end-start) + " milliseconds) that matched query '" + q + "':"); 
        
        for(int i=0; i<hits.length(); i++){
            Document doc=hits.doc(i);
        }
        
    }
    
}

What do I need to do to achieve this?  Thanks to everyone in advance.

Sincerely;
Fayyaz
-- 
View this message in context: http://www.nabble.com/Using-Highlighter-to-return-search-results-tp17037588p17037588.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org