You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Nikita Zhiltsov <ni...@gmail.com> on 2013/06/15 01:43:34 UTC

Adding a mixture of language models to Lucene 4.0

Hi all,

I've just published a tiny extension to Lucene 4.0, which enables a mixture
of language models using standard FunctionQuery and ValueSource classes:
https://github.com/nzhiltsov/lucene-mlm

I'd like you to assess the possibility of integrating this code into
Lucene. Appreciate any comments or fixes.

NB. The implementation avoids using LMSimilarity per field basis,
because it would break the computation of correct Dirichlet priors for
non-matched terms, which the standard class LMSimilarity fails to include
while calculating term frequencies and treats them as zero probability
entries.

-- 

Nikita Zhiltsov

Visiting Graduate Student
Emory University
Intelligent Information Access Lab
E500 Emerson Hall, Atlanta, Georgia, USA
Phone: (404) 834-5364
E-mail: znikita@emory.edu


---------------------------------------------------------------------
Graduate Student, Research Fellow
Kazan Federal University
Computational Linguistics Laboratory
Russia, 420008
Kazan, Prof. Nuzhina Str., 1/37 room 117
Skype: nickita.jhiltsov
Personal page: http://cll.niimm.ksu.ru/~nzhiltsov
E-mail: nikita.zhiltsov@gmail.com

---------------------------------------------------------------------

Re: Adding a mixture of language models to Lucene 4.0

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi Nikita,

Speaking only for myself here... maybe explain more about what this
library does in plain English - what problem does it solve?  I had to
look up the paper (ha! a known item!):
http://www.cs.cmu.edu/~callan/Papers/sigir03-pto.pdf (add to README so
others don't have to search?)

To make it easy to add this to Lucene, you should:
* use and include ASL
* include ASL snippet in each Java class
* switch to Java for tests
* move to org.apache.lucene...

HTH,
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/





On Fri, Jun 14, 2013 at 7:43 PM, Nikita Zhiltsov
<ni...@gmail.com> wrote:
> Hi all,
>
> I've just published a tiny extension to Lucene 4.0, which enables a mixture
> of language models using standard FunctionQuery and ValueSource classes:
> https://github.com/nzhiltsov/lucene-mlm
>
> I'd like you to assess the possibility of integrating this code into Lucene.
> Appreciate any comments or fixes.
>
> NB. The implementation avoids using LMSimilarity per field basis, because it
> would break the computation of correct Dirichlet priors for non-matched
> terms, which the standard class LMSimilarity fails to include while
> calculating term frequencies and treats them as zero probability entries.
>
> --
>
> Nikita Zhiltsov
>
> Visiting Graduate Student
> Emory University
> Intelligent Information Access Lab
> E500 Emerson Hall, Atlanta, Georgia, USA
> Phone: (404) 834-5364
> E-mail: znikita@emory.edu
>
>
> ---------------------------------------------------------------------
> Graduate Student, Research Fellow
> Kazan Federal University
> Computational Linguistics Laboratory
> Russia, 420008
> Kazan, Prof. Nuzhina Str., 1/37 room 117
> Skype: nickita.jhiltsov
> Personal page: http://cll.niimm.ksu.ru/~nzhiltsov
> E-mail: nikita.zhiltsov@gmail.com
>
> ---------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org