You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Chris Sibert <ch...@attbi.com> on 2002/07/09 00:23:17 UTC

Index smarts

Does the Lucene index keep track of where in the original document it found each term's occurrence ? 

For example: Lucene is indexing a file, and one of the terms found was "banana", and "banana" occurred in the file 3 times. Does Lucene save in the index where it found each occurrence of "banana" ? So for example, I could go to the file's offset of position 100 and find "banana". For that matter, does Lucene know that it was found three times, or just that it was found ? 

The reason I ask is that when a user searches for something, I might like to just display snippets of the original file where the term was found, instead of the whole thing, because some of the files are quite large.

Re: Tokenization

Posted by Peter Carlson <ca...@bookandhammer.com>.

Using a wildcard search might be useful for you.

So searching for "l*" will find life.

I hope this helps.
--Peter


On 7/8/02 11:39 PM, "Pradeep Kumar K" <pr...@robosoftin.com> wrote:

> 
> Hi All
> 
> In Lucene the tokenization of Sentences is happening word wise not
> letter wise, So when we search for a letter which contained in a
> sentence it will search for a word like the like the letter we entered.
> Example :  sentence "life is a big stage where we are actors" If I
> search for life or any other word lucene seems to be returning correct
> results. But if I search for  "l" or 's' it will return no results.
> 
> I am not sure whether there are any methods which tokenizes the sentence
> into letters. If  anybody knows please put it in the mailing list
> 
> Best wishes
> Pradeep
> 
> 
> 
> --------------------------------------------------------------
> Robosoft Technologies - Partners in Product Development
> 
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Tokenization

Posted by Pradeep Kumar K <pr...@robosoftin.com>.

Hi All

In Lucene the tokenization of Sentences is happening word wise not 
letter wise, So when we search for a letter which contained in a 
sentence it will search for a word like the like the letter we entered. 
Example :  sentence "life is a big stage where we are actors" If I 
search for life or any other word lucene seems to be returning correct 
results. But if I search for  "l" or 's' it will return no results.

I am not sure whether there are any methods which tokenizes the sentence 
into letters. If  anybody knows please put it in the mailing list

Best wishes
Pradeep



--------------------------------------------------------------
Robosoft Technologies - Partners in Product Development



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>