You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Sibert <ch...@attbi.com> on 2002/07/09 00:23:17 UTC
Index smarts
Does the Lucene index keep track of where in the original document it found each term's occurrence ?
For example: Lucene is indexing a file, and one of the terms found was "banana", and "banana" occurred in the file 3 times. Does Lucene save in the index where it found each occurrence of "banana" ? So for example, I could go to the file's offset of position 100 and find "banana". For that matter, does Lucene know that it was found three times, or just that it was found ?
The reason I ask is that when a user searches for something, I might like to just display snippets of the original file where the term was found, instead of the whole thing, because some of the files are quite large.
Re: Tokenization
Posted by Peter Carlson <ca...@bookandhammer.com>.
Using a wildcard search might be useful for you.
So searching for "l*" will find life.
I hope this helps.
--Peter
On 7/8/02 11:39 PM, "Pradeep Kumar K" <pr...@robosoftin.com> wrote:
>
> Hi All
>
> In Lucene the tokenization of Sentences is happening word wise not
> letter wise, So when we search for a letter which contained in a
> sentence it will search for a word like the like the letter we entered.
> Example : sentence "life is a big stage where we are actors" If I
> search for life or any other word lucene seems to be returning correct
> results. But if I search for "l" or 's' it will return no results.
>
> I am not sure whether there are any methods which tokenizes the sentence
> into letters. If anybody knows please put it in the mailing list
>
> Best wishes
> Pradeep
>
>
>
> --------------------------------------------------------------
> Robosoft Technologies - Partners in Product Development
>
>
>
> --
> To unsubscribe, e-mail: <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Tokenization
Posted by Pradeep Kumar K <pr...@robosoftin.com>.
Hi All
In Lucene the tokenization of Sentences is happening word wise not
letter wise, So when we search for a letter which contained in a
sentence it will search for a word like the like the letter we entered.
Example : sentence "life is a big stage where we are actors" If I
search for life or any other word lucene seems to be returning correct
results. But if I search for "l" or 's' it will return no results.
I am not sure whether there are any methods which tokenizes the sentence
into letters. If anybody knows please put it in the mailing list
Best wishes
Pradeep
--------------------------------------------------------------
Robosoft Technologies - Partners in Product Development
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>