You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by cl...@student.fsa.ucl.ac.be on 2003/07/08 17:58:19 UTC
Detailed information about searching,indexing technique
Hello. I'm working in a recent company called Denali
which is interested by using Lucene. I have been
looking on the official website in order to get
information about this but i did'nt found any
explanation about how (in details) the index is create
and how the search is being made on it .
In fact we would like to add two special query:
-one which could find what are the most frequent term
in a document.
-one which could find what are the most frequent term
associated whith anoter term(for example: for a given
term "lucene", we will find "search","moteur","open
source",....)
If somebody could indicate where I could find details
information not on "how to use Lucene" but "How does it
works in details?(algorithme used,...)", it would be
nice.
Best regards
Claude Libois
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Detailed information about searching,indexing technique
Posted by Leo Galambos <Le...@seznam.cz>.
Ricardo Baeza-Yates, Berthier Ribeiro-Neto: Modern Information
Retrieval, ACM Press, ISBN 0-201-39829-X
The searching phase is trivial, if you know the basic vector model.
The indexing phase is described on pp 196-199. It is a classic algorithm.
Your queries:
1 - see the archive.
2 - you cannot solve it AFAIK. BTW, you would rather play with the
entropy than with frequencies.
-g-
clibois@student.fsa.ucl.ac.be wrote:
>Hello. I'm working in a recent company called Denali
>which is interested by using Lucene. I have been
>looking on the official website in order to get
>information about this but i did'nt found any
>explanation about how (in details) the index is create
>and how the search is being made on it .
> In fact we would like to add two special query:
>-one which could find what are the most frequent term
>in a document.
>-one which could find what are the most frequent term
>associated whith anoter term(for example: for a given
>term "lucene", we will find "search","moteur","open
>source",....)
>If somebody could indicate where I could find details
>information not on "how to use Lucene" but "How does it
>works in details?(algorithme used,...)", it would be
>nice.
>Best regards
>Claude Libois
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org