You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Hui Fang <hu...@gmail.com> on 2007/02/02 05:42:14 UTC

implementatin of the state-of-art retrieval models for lucene?

Dear all,

My primary research interest is Information retrieval, with a focus on
developing
effective and robust retrieval models. I am happy to send my first email
to Lucene community.

Lucene and nutch are really useful IR systems. But I think that the current
retrieval function
implemented in Lucene does not perform as well as other state-of-art
retrieval functions in terms of effectiveness.  I have implemented some
state-of-art models
(such as pivoted normalization, okapi and axiomatic retrieval models)
on top of Lucene, and evaluated these models and the default model
implemented in
Lucene using standard IR evaluation methodology. Experiments show that
the state-of-art retrieval functions outperform the default one.
Actually, this is one assignment my advisor and I designed for our IR
course.

After posting this assignment online, quite a few IR researchers contacted
us and
asked for the code of our implementations.  So, we think that
it might be beneficial to everyone in the lucene community and IR research
community,
if we could contribute our implementation of the state-of-art retrieval
functions to Lucene.
I think that our contribution could help improve the retrieval performance
for both
Lucene and nutch.

What do you think?

Thanks,
-Hui

Re: implementatin of the state-of-art retrieval models for lucene?

Posted by Grant Ingersoll <gs...@apache.org>.

Hi Hui,

We love contributions!  Take a look at http://wiki.apache.org/jakarta- 
lucene/HowToContribute

Are these changes on top of Lucene or part of the core?  If they are  
on top, they could go as a contrib which is much easier to get  
accepted. The best way to submit changes is through a patch.  It is  
best if you have unit tests and some documentation explaining what  
you did, but not an absolute requirement.


-Grant

On Feb 1, 2007, at 11:42 PM, Hui Fang wrote:

> Dear all,
>
> My primary research interest is Information retrieval, with a focus on
> developing
> effective and robust retrieval models. I am happy to send my first  
> email
> to Lucene community.
>
> Lucene and nutch are really useful IR systems. But I think that the  
> current
> retrieval function
> implemented in Lucene does not perform as well as other state-of-art
> retrieval functions in terms of effectiveness.  I have implemented  
> some
> state-of-art models
> (such as pivoted normalization, okapi and axiomatic retrieval models)
> on top of Lucene, and evaluated these models and the default model
> implemented in
> Lucene using standard IR evaluation methodology. Experiments show that
> the state-of-art retrieval functions outperform the default one.
> Actually, this is one assignment my advisor and I designed for our IR
> course.
>
> After posting this assignment online, quite a few IR researchers  
> contacted
> us and
> asked for the code of our implementations.  So, we think that
> it might be beneficial to everyone in the lucene community and IR  
> research
> community,
> if we could contribute our implementation of the state-of-art  
> retrieval
> functions to Lucene.
> I think that our contribution could help improve the retrieval  
> performance
> for both
> Lucene and nutch.
>
> What do you think?
>
> Thanks,
> -Hui

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: implementatin of the state-of-art retrieval models for lucene?

Posted by José Ramón Pérez Agüera <jo...@fdi.ucm.es>.


----- Mensaje original -----
De: Hui Fang <hu...@gmail.com>
Fecha: Viernes, Febrero 2, 2007 5:45 am
Asunto: implementatin of the state-of-art retrieval models for lucene?
A: java-dev@lucene.apache.org

> Dear all,
> 
> My primary research interest is Information retrieval, with a 
> focus on
> developing
> effective and robust retrieval models. I am happy to send my 
> first email
> to Lucene community.
> 
> Lucene and nutch are really useful IR systems. But I think that 
> the current
> retrieval function
> implemented in Lucene does not perform as well as other state-of-art
> retrieval functions in terms of effectiveness.  I have 
> implemented some
> state-of-art models
> (such as pivoted normalization, okapi and axiomatic retrieval models)
> on top of Lucene, and evaluated these models and the default model
> implemented in
> Lucene using standard IR evaluation methodology. Experiments 
> show that
> the state-of-art retrieval functions outperform the default one.
> Actually, this is one assignment my advisor and I designed for 
> our IR
> course.
> 
> After posting this assignment online, quite a few IR researchers 
> contactedus and
> asked for the code of our implementations.  So, we think that
> it might be beneficial to everyone in the lucene community and 
> IR research
> community,
> if we could contribute our implementation of the state-of-art 
> retrievalfunctions to Lucene.
> I think that our contribution could help improve the retrieval 
> performancefor both
> Lucene and nutch.
> 
> What do you think?
> 
> Thanks,
> -Hui
> 

José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

Re: implementatin of the state-of-art retrieval models for lucene?

Posted by José Ramón Pérez Agüera <jo...@fdi.ucm.es>.

Dear Hui,



i'm a Ph. d. student from University Complutense of Madrid (Spain)
where i'm teaching assistant also, in the departament of Artificial
Intelligence. I'm working with Lucene from two years ago, and i'm very
interesting on re-implement certain classes (TermQuery, TermScorer,
DefaultSimilarity) to adapt it to the state-of-art models in
information retrieval BM25, LM, DFR, etc. I'm working also in the
implementation of the evaluation module for Lucene to work with TREC
collections and similars.



I think that would be a good idea if we create a subproject of Lucene
to develop new IR models and differents tools focused to IR community.
I would be very interested on this issue and i think that would be very
intereseting not only for IR comunity but also to Lucene comunity.



What do you think about this idea?



Best



jose

----- Mensaje original -----
De: Hui Fang <hu...@gmail.com>
Fecha: Viernes, Febrero 2, 2007 5:45 am
Asunto: implementatin of the state-of-art retrieval models for lucene?
A: java-dev@lucene.apache.org

> Dear all,
> 
> My primary research interest is Information retrieval, with a 
> focus on
> developing
> effective and robust retrieval models. I am happy to send my 
> first email
> to Lucene community.
> 
> Lucene and nutch are really useful IR systems. But I think that 
> the current
> retrieval function
> implemented in Lucene does not perform as well as other state-of-art
> retrieval functions in terms of effectiveness.  I have 
> implemented some
> state-of-art models
> (such as pivoted normalization, okapi and axiomatic retrieval models)
> on top of Lucene, and evaluated these models and the default model
> implemented in
> Lucene using standard IR evaluation methodology. Experiments 
> show that
> the state-of-art retrieval functions outperform the default one.
> Actually, this is one assignment my advisor and I designed for 
> our IR
> course.
> 
> After posting this assignment online, quite a few IR researchers 
> contactedus and
> asked for the code of our implementations.  So, we think that
> it might be beneficial to everyone in the lucene community and 
> IR research
> community,
> if we could contribute our implementation of the state-of-art 
> retrievalfunctions to Lucene.
> I think that our contribution could help improve the retrieval 
> performancefor both
> Lucene and nutch.
> 
> What do you think?
> 
> Thanks,
> -Hui
> 

José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

RE: implementatin of the state-of-art retrieval models for lucene?

Posted by "Dalton, Jeffery" <jd...@globalspec.com>.

Hi Hui,

I would love to experiment with your retrieval models.  There have been
various conversations about BM25 and other functions, but little is
publicly available.  

Cheers,

- Jeff 

> -----Original Message-----
> From: Hui Fang [mailto:huihuifang@gmail.com] 
> Sent: Thursday, February 01, 2007 11:42 PM
> To: java-dev@lucene.apache.org
> Subject: implementatin of the state-of-art retrieval models 
> for lucene?
> 
> Dear all,
> 
> My primary research interest is Information retrieval, with a 
> focus on developing effective and robust retrieval models. I 
> am happy to send my first email to Lucene community.
> 
> Lucene and nutch are really useful IR systems. But I think 
> that the current retrieval function implemented in Lucene 
> does not perform as well as other state-of-art retrieval 
> functions in terms of effectiveness.  I have implemented some 
> state-of-art models (such as pivoted normalization, okapi and 
> axiomatic retrieval models) on top of Lucene, and evaluated 
> these models and the default model implemented in Lucene 
> using standard IR evaluation methodology. Experiments show 
> that the state-of-art retrieval functions outperform the default one.
> Actually, this is one assignment my advisor and I designed 
> for our IR course.
> 
> After posting this assignment online, quite a few IR 
> researchers contacted us and asked for the code of our 
> implementations.  So, we think that it might be beneficial to 
> everyone in the lucene community and IR research community, 
> if we could contribute our implementation of the state-of-art 
> retrieval functions to Lucene.
> I think that our contribution could help improve the 
> retrieval performance for both Lucene and nutch.
> 
> What do you think?
> 
> Thanks,
> -Hui
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org