You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Alan Chandler <al...@chandlerfamily.org.uk> on 2005/12/09 21:18:56 UTC

Confused again ... Getting at results

I am slowly making may way through lucene, as witnessed by earlier threads to 
this mailing list.

But I am stuck again, going round in circles with the Javadocs.

I want to display the results of a user entered search where for each document 
I put out a small summary with the searched for words highlighted.

When I wrote the Analyzer for my documents, I produced the tokenstream  to 
generate Token objects with the start end end positions of each term in them

Now, from my Hits object I can find each document I need to output, but how do 
I get back to the Tokens I originally produced.

I suspect it has something to do with FilterIndexReader and its nested class 
FilterTermPositions, but I can't see how to link these to the seach I have 
just done

How is it done?



-- 
Alan Chandler
http://www.chandlerfamily.org.uk
Open Source. It's the difference between trust and antitrust.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Confused again ... Getting at results

Posted by Alan Chandler <al...@chandlerfamily.org.uk>.

On Saturday 10 Dec 2005 00:17, Erik Hatcher wrote:

> > When I wrote the Analyzer for my documents, I produced the
> > tokenstream  to
> > generate Token objects with the start end end positions of each
> > term in them
> >
> > Now, from my Hits object I can find each document I need to output,
> > but how do
> > I get back to the Tokens I originally produced.
>
> Are you using Lucene 1.4.3?  Or the latest Subversion version?
1.4.3


>
> The Lucene index does not keep all of the information in the Token's
> emitted by the analyzer (unless specified to do so, but 1.4.3 didn't
> support the fancier features).
>
> So, the fail-safe way is to re-tokenize the original text (perhaps
> stored in the Lucene index) and hand that TokenStream to the
> Highlighter.

I found a highlighter in the sandbox - I think that is doing something like 
this, so am going to experiment with that.

Initial attempt failed to compile because I think it was assuming a later 
version of the lucene code, but it looks like I just cut out the offending 
class and its ok.


-- 
Alan Chandler
http://www.chandlerfamily.org.uk
Open Source. It's the difference between trust and antitrust.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Confused again ... Getting at results

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 9, 2005, at 3:18 PM, Alan Chandler wrote:
> I am slowly making may way through lucene, as witnessed by earlier  
> threads to
> this mailing list.
>
> But I am stuck again, going round in circles with the Javadocs.
>
> I want to display the results of a user entered search where for  
> each document
> I put out a small summary with the searched for words highlighted.
>
> When I wrote the Analyzer for my documents, I produced the  
> tokenstream  to
> generate Token objects with the start end end positions of each  
> term in them
>
> Now, from my Hits object I can find each document I need to output,  
> but how do
> I get back to the Tokens I originally produced.

Are you using Lucene 1.4.3?  Or the latest Subversion version?

The Lucene index does not keep all of the information in the Token's  
emitted by the analyzer (unless specified to do so, but 1.4.3 didn't  
support the fancier features).

So, the fail-safe way is to re-tokenize the original text (perhaps  
stored in the Lucene index) and hand that TokenStream to the  
Highlighter.

Or you can experiment with the additional Field constructors to  
enable the storage of token offsets and the Highlighter can use those  
for a little better performance, but it's likely to be unnoticeable  
for your application to simply re-tokenize on the fly for only the  
fields you're displaying.  Storing the token offsets increases the  
index size, of course.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org