You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by Fredrik Andersson <fi...@gmail.com> on 2005/09/04 23:44:55 UTC

VSM in Lucene, again

Hi folks.

I read a transcript from last months digest of this list, in a post by 
Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my previous 
work with VSM, it has included matching a query vector towards the documents 
in the term-document space. I have dissected and customized a lot of classes 
in the Lucene indexing and searching classes, but I have yet to discover 
where the actual dot product of the query vector and the document vectors is 
performed, if Lucene uses this method for information retrieval. Using this 
method involves a certain angle which you consider as "close", which is a 
parameter that Lucene would benefit from exposing in its API. This I have 
not seen any trails of, either. To keep a long story short, a lot of the 
stuff that I usually associate with VSM and LSI information retrieval is 
missing or cleverly hidden.

If someone could shed some light on this issue, I would be very thankful. 
It's probably just that we have different notions of the VSM model, but I'd 
like to get this straightened out.

Greetings,
Fredrik

RE: VSM in Lucene, again

Posted by Madhu Satyanarayana Panitini <Ma...@pass-consulting.com>.
Hi Fredrik,

I have asked question before, Erik Hatcher has give me the link below

     http://www.lucenebook.com/blog/errata/scoring_formula_omission.html

It shows a formula which was not completely implemented.

Regards
Madhu

-----Original Message-----
From: Fredrik Andersson [mailto:fidde.andersson@gmail.com] 
Sent: Monday, September 05, 2005 1:35 PM
To: general@lucene.apache.org
Subject: Re: VSM in Lucene, again

Hi Otis,

Yes, I have looked through that class thoroughly, but all I see is an 
IDF-map lookup with boost functionality. The only thing allowing a query
to 
return a document that is not containing the terms in the query is by
the 
sloppyFreq function. It's more of a semantic trick based on edit
distance, 
so it has nothing to do with the vector angles in a regular vector space

model. The document terms still have to be semantically similar to the
ones 
in the query, which is not the case when matching by vector angles in a
VSM 
(though you often boost documents containing words from the query, 
naturally).

Fredrik

On 9/5/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> 
> Hi Fredrik,
> 
> Are you looking for org.apache.lucene.search.DefaultSimilarity ?
> 
> Otis
> 
> --- Fredrik Andersson <fi...@gmail.com> wrote:
> 
> > Hi folks.
> >
> > I read a transcript from last months digest of this list, in a post
> > by
> > Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my
> > previous
> > work with VSM, it has included matching a query vector towards the
> > documents
> > in the term-document space. I have dissected and customized a lot of
> > classes
> > in the Lucene indexing and searching classes, but I have yet to
> > discover
> > where the actual dot product of the query vector and the document
> > vectors is
> > performed, if Lucene uses this method for information retrieval.
> > Using this
> > method involves a certain angle which you consider as "close", which
> > is a
> > parameter that Lucene would benefit from exposing in its API. This I
> > have
> > not seen any trails of, either. To keep a long story short, a lot of
> > the
> > stuff that I usually associate with VSM and LSI information
retrieval
> > is
> > missing or cleverly hidden.
> >
> > If someone could shed some light on this issue, I would be very
> > thankful.
> > It's probably just that we have different notions of the VSM model,
> > but I'd
> > like to get this straightened out.
> >
> > Greetings,
> > Fredrik
> >
> 
>


Re: VSM in Lucene, again

Posted by Fredrik Andersson <fi...@gmail.com>.
Hi Otis,

Yes, I have looked through that class thoroughly, but all I see is an 
IDF-map lookup with boost functionality. The only thing allowing a query to 
return a document that is not containing the terms in the query is by the 
sloppyFreq function. It's more of a semantic trick based on edit distance, 
so it has nothing to do with the vector angles in a regular vector space 
model. The document terms still have to be semantically similar to the ones 
in the query, which is not the case when matching by vector angles in a VSM 
(though you often boost documents containing words from the query, 
naturally).

Fredrik

On 9/5/05, Otis Gospodnetic <ot...@yahoo.com> wrote:
> 
> Hi Fredrik,
> 
> Are you looking for org.apache.lucene.search.DefaultSimilarity ?
> 
> Otis
> 
> --- Fredrik Andersson <fi...@gmail.com> wrote:
> 
> > Hi folks.
> >
> > I read a transcript from last months digest of this list, in a post
> > by
> > Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my
> > previous
> > work with VSM, it has included matching a query vector towards the
> > documents
> > in the term-document space. I have dissected and customized a lot of
> > classes
> > in the Lucene indexing and searching classes, but I have yet to
> > discover
> > where the actual dot product of the query vector and the document
> > vectors is
> > performed, if Lucene uses this method for information retrieval.
> > Using this
> > method involves a certain angle which you consider as "close", which
> > is a
> > parameter that Lucene would benefit from exposing in its API. This I
> > have
> > not seen any trails of, either. To keep a long story short, a lot of
> > the
> > stuff that I usually associate with VSM and LSI information retrieval
> > is
> > missing or cleverly hidden.
> >
> > If someone could shed some light on this issue, I would be very
> > thankful.
> > It's probably just that we have different notions of the VSM model,
> > but I'd
> > like to get this straightened out.
> >
> > Greetings,
> > Fredrik
> >
> 
>

Re: VSM in Lucene, again

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Fredrik,

Are you looking for org.apache.lucene.search.DefaultSimilarity ?

Otis

--- Fredrik Andersson <fi...@gmail.com> wrote:

> Hi folks.
> 
> I read a transcript from last months digest of this list, in a post
> by 
> Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my
> previous 
> work with VSM, it has included matching a query vector towards the
> documents 
> in the term-document space. I have dissected and customized a lot of
> classes 
> in the Lucene indexing and searching classes, but I have yet to
> discover 
> where the actual dot product of the query vector and the document
> vectors is 
> performed, if Lucene uses this method for information retrieval.
> Using this 
> method involves a certain angle which you consider as "close", which
> is a 
> parameter that Lucene would benefit from exposing in its API. This I
> have 
> not seen any trails of, either. To keep a long story short, a lot of
> the 
> stuff that I usually associate with VSM and LSI information retrieval
> is 
> missing or cleverly hidden.
> 
> If someone could shed some light on this issue, I would be very
> thankful. 
> It's probably just that we have different notions of the VSM model,
> but I'd 
> like to get this straightened out.
> 
> Greetings,
> Fredrik
>