You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Katya <ka...@gmail.com> on 2008/02/10 11:41:04 UTC

Offsets-highlight newbie question

Dear All,

I'm relatively new to Java and especially new to Lucene. I study Computational
Linguistics and this term we are required to make projects using Lucene. 

My choice was to write a user-friendly application, which could allow user to
crate collections of text files and then search through them. 

Lucene works great! I create index, analyzer, send query, get hits...
The problem is that one of the text of the collection is always displayed in a
JTextArea, and I want to be able to highlight, mark out in color, the words
that match the query. 

Now, there is the Highlighter in Java, you need to give it offsets to highlight
a string, and I would be glad to do so if I could get the offsets!

I will give snippets of code here:
//read the index, store the terms

	try {
        	inread = IndexReader.open(pathToIndex);
        	terms = inread.terms();
        	
        } catch (IOException e) {
        	System.out.println(e.getMessage() + " IndexReader failed"); }

//then - get positions for a term

	try {
    		TermPositions pos =
inread.termPositions(aTerm));

    		while (pos.next()){
    			for (int i=0; i<pos.freq(); i++){
				//add position to a list
	    		}
    	} catch (IOException ex) {
    		System.out.println(ex.getMessage()); } 

	hilit.removeAllHighlights();
        
//and later

	try {
		//get text from the JTextArea 
        	String curText = text.getText();

		//start scanning it - okay, I guess there must be another way, 
		// but this is the only i could think of
        	Scanner in = new Scanner(curText);
        	int counter = 0;
		int point = 0;
		//yes, the while-loop looks ugly
       		while(in.hasNext() && counter!=curText.length()){
       			for (int i=0; i<positions.size();i++){
       				String word = in.next();
       				if (counter == positions.get(i)){
					hilit.addHighlight(curText.indexOf(word,
point), curText.indexOf(word, point)+wordlength,painter); 
				point = curText.indexOf(word)+l; }
        		}
        		counter++;
        	}
       	} catch (BadLocationException e) {
       		System.err.println("error"); 
       		e.printStackTrace();
       	}


I had thought it should have worked, but when I give it a text, say, "blah blah
blah a little text", and query for "blah", it gives me some strange numbers, not
0,1,2 or 1,2,3, and even not 0, 5, 11, what it gives me is 1, 5, 14! Where do
these numbers come from? Where can I retrieve "real" pointers into the text,
or find another way to highlight hits?

Please, give me advice, the project is due February, 14th, so I'm really
desperate. 

Best Regards, 
Katya

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org