You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by cl...@student.fsa.ucl.ac.be on 2003/07/08 17:58:19 UTC

Detailed information about searching,indexing technique

Hello. I'm working in a recent company called Denali 
which is interested by using Lucene. I have been 
looking on the official website in order to get 
information about this but i did'nt found any 
explanation about how (in details) the index is create 
and how the search is being made on it .
 In fact we would like to add two special query:
-one which could find what are the most frequent term 
in a document. 
-one which could find what are the most frequent term 
associated whith anoter term(for example: for a given 
term "lucene", we will find "search","moteur","open 
source",....)
If somebody could indicate where I could find details 
information not on "how to use Lucene" but "How does it 
works in details?(algorithme used,...)", it would be 
nice.
Best regards
Claude Libois
 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Detailed information about searching,indexing technique

Posted by Leo Galambos <Le...@seznam.cz>.

Ricardo Baeza-Yates, Berthier Ribeiro-Neto: Modern Information 
Retrieval, ACM Press, ISBN 0-201-39829-X

The searching phase is trivial, if you know the basic vector model.
The indexing phase is described on pp 196-199. It is a classic algorithm.

Your queries:
1 - see the archive.
2 - you cannot solve it AFAIK. BTW, you would rather play with the 
entropy than with frequencies.

-g-

clibois@student.fsa.ucl.ac.be wrote:

>Hello. I'm working in a recent company called Denali 
>which is interested by using Lucene. I have been 
>looking on the official website in order to get 
>information about this but i did'nt found any 
>explanation about how (in details) the index is create 
>and how the search is being made on it .
> In fact we would like to add two special query:
>-one which could find what are the most frequent term 
>in a document. 
>-one which could find what are the most frequent term 
>associated whith anoter term(for example: for a given 
>term "lucene", we will find "search","moteur","open 
>source",....)
>If somebody could indicate where I could find details 
>information not on "how to use Lucene" but "How does it 
>works in details?(algorithme used,...)", it would be 
>nice.
>Best regards
>Claude Libois
> 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
>  
>




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org