You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eswar K <kj...@gmail.com> on 2007/11/26 14:23:31 UTC
LSA Implementation
All,
Is there any plan to implement Latent Semantic Analysis as part of Solr
anytime in the near future?
Regards,
Eswar
Re: LSA Implementation
Posted by Chris Hostetter <ho...@fucit.org>.
: A more interesting solr related question is where a very heavy process like
: SVD would operate. You'd want to run the 'training' half of it separate from a
: indexing or querying. It'd almost be like an optimize. Is there any hook right
: now to give Solr a "command" like <updateModels/> and map it to the class in
: the solrconfig? The classify half of the SVD can happen at query or index
: time, very quickly, I imagine that could even be a custom field type.
The EventListener plugin type let's you register arbitrary java code to be
run after a commit or an optimize (before a new searcher is opened) ...
this is the same hook mechanism that is used to trigger snapshots on
masters and do explicit warming on slaves.
there was talk about creating a request handler that could be used to
trigger aritrary "events" and xecute all of hte EventListeners (so you
could create a new "updateModels" even type, independent of commit and
optimize) but no one has ever submitted a patch...
http://issues.apache.org/jira/browse/SOLR-371
-Hoss
Re: LSA Implementation
Posted by Brian Whitman <br...@variogr.am>.
On Nov 26, 2007 6:58 AM, Grant Ingersoll <gs...@apache.org> wrote:
> LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
> patented, so it is not likely to happen unless the authors donate the
> patent to the ASF.
>
> -Grant
>
There are many ways to catch a bird... LSA reduces to SVD on the TF
graph. I have had limited success using JAMA's SVD, which is PD. It's
pure java; for something serious you'd want to wrap the hard bits in
MKL/Accelerate.
A more interesting solr related question is where a very heavy
process like SVD would operate. You'd want to run the 'training' half
of it separate from a indexing or querying. It'd almost be like an
optimize. Is there any hook right now to give Solr a "command" like
<updateModels/> and map it to the class in the solrconfig? The
classify half of the SVD can happen at query or index time, very
quickly, I imagine that could even be a custom field type.
Re: LSA Implementation
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Nov 26, 2007, at 6:34 PM, Eswar K wrote:
> Although the algorithm doesn't understand anything
> about what the words *mean*, the patterns it notices can make it seem
> astonishingly intelligent.
>
> When you search an such an index, the search engine looks at
> similarity
> values it has calculated for every content word, and returns the
> documents
> that it thinks best fit the query. Because two documents may be
> semantically
> very close even if they do not share a particular keyword,
>
> Where a plain keyword search will fail if there is no exact match,
> this algo
> will often return relevant documents that don't contain the keyword
> at all.
Perhaps I should have been less curt. I've read a few papers on LSA,
so I'm familiar at least in passing with everything you describe
above. It would be entertaining to write an implementation, and I've
considered it... but it's a low priority while the patent's in force.
A full term-vector space calculation is... expensive :) ... so LSA
performs reduction. Tuning the algorithm for a threshold effect not
just against "n words in common" but against a rough approximation of
"n words in common" is presumably non-trivial.
If you can either find or write open source software that pulls off
such "astonishingly intelligent" matches despite the many challenges,
kudos. I'd love to see it.
Cheers,
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Re: LSA Implementation
Posted by Eswar K <kj...@gmail.com>.
Lance,
It does cover European languages, but pretty much nothing on Asian languages
(CJK).
- Eswar
On Nov 28, 2007 1:51 AM, Norskog, Lance <la...@divvio.com> wrote:
> WordNet itself is English-only. There are various ontology projects for
> it.
>
> http://www.globalwordnet.org/ is a separate world language database
> project. I found it at the bottom of the WordNet wikipedia page. Thanks
> for starting me on the search!
>
> Lance
>
> -----Original Message-----
> From: Eswar K [mailto:kja.eswar@gmail.com]
> Sent: Monday, November 26, 2007 6:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LSA Implementation
>
> The languages also include CJK :) among others.
>
> - Eswar
>
> On Nov 27, 2007 8:16 AM, Norskog, Lance <la...@divvio.com> wrote:
>
> > The WordNet project at Princeton (USA) is a large database of
> synonyms.
> > If you're only working in English this might be useful instead of
> > running your own analyses.
> >
> > http://en.wikipedia.org/wiki/WordNet
> > http://wordnet.princeton.edu/
> >
> > Lance
> >
> > -----Original Message-----
> > From: Eswar K [mailto:kja.eswar@gmail.com]
> > Sent: Monday, November 26, 2007 6:34 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: LSA Implementation
> >
> > In addition to recording which keywords a document contains, the
> > method examines the document collection as a whole, to see which other
>
> > documents contain some of those same words. this algo should consider
> > documents that have many words in common to be semantically close, and
>
> > ones with few words in common to be semantically distant. This simple
> > method correlates surprisingly well with how a human being, looking at
>
> > content, might classify a document collection. Although the algorithm
> > doesn't understand anything about what the words *mean*, the patterns
> > it notices can make it seem astonishingly intelligent.
> >
> > When you search an such an index, the search engine looks at
> > similarity values it has calculated for every content word, and
> > returns the documents that it thinks best fit the query. Because two
> > documents may be semantically very close even if they do not share a
> > particular keyword,
> >
> > Where a plain keyword search will fail if there is no exact match,
> > this algo will often return relevant documents that don't contain the
> > keyword at all.
> >
> > - Eswar
> >
> > On Nov 27, 2007 7:51 AM, Marvin Humphrey <ma...@rectangular.com>
> wrote:
> >
> > >
> > > On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
> > >
> > > > We essentially are looking at having an implementation for doing
> > > > search which can return documents having conceptually similar
> > > > words without necessarily having the original word searched for.
> > >
> > > Very challenging. Say someone searches for "LSA" and hits an
> > > archived
> >
> > > version of the mail you sent to this list. "LSA" is a reasonably
> > > discriminating term. But so is "Eswar".
> > >
> > > If you knew that the original term was "LSA", then you might look
> > > for documents near it in term vector space. But if you don't know
> > > the original term, only the content of the document, how do you know
>
> > > whether you should look for docs near "lsa" or "eswar"?
> > >
> > > Marvin Humphrey
> > > Rectangular Research
> > > http://www.rectangular.com/
> > >
> > >
> > >
> >
>
Re: LSA Implementation
Posted by Grant Ingersoll <gs...@apache.org>.
Using Wordnet may require having some type of disambiguation approach,
otherwise you can end up w/ a lot of "synonyms". I also would look
into how much coverage there is for non-English languages.
If you have the resources, you may be better off developing/finding
your own synonym/concept list based on your genres. You may also look
into other approaches for assigning concepts off line and adding them
to the document.
-Grant
On Nov 27, 2007, at 3:21 PM, Norskog, Lance wrote:
> WordNet itself is English-only. There are various ontology projects
> for
> it.
>
> http://www.globalwordnet.org/ is a separate world language database
> project. I found it at the bottom of the WordNet wikipedia page.
> Thanks
> for starting me on the search!
>
> Lance
>
> -----Original Message-----
> From: Eswar K [mailto:kja.eswar@gmail.com]
> Sent: Monday, November 26, 2007 6:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LSA Implementation
>
> The languages also include CJK :) among others.
>
> - Eswar
>
> On Nov 27, 2007 8:16 AM, Norskog, Lance <la...@divvio.com> wrote:
>
>> The WordNet project at Princeton (USA) is a large database of
> synonyms.
>> If you're only working in English this might be useful instead of
>> running your own analyses.
>>
>> http://en.wikipedia.org/wiki/WordNet
>> http://wordnet.princeton.edu/
>>
>> Lance
>>
>> -----Original Message-----
>> From: Eswar K [mailto:kja.eswar@gmail.com]
>> Sent: Monday, November 26, 2007 6:34 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: LSA Implementation
>>
>> In addition to recording which keywords a document contains, the
>> method examines the document collection as a whole, to see which
>> other
>
>> documents contain some of those same words. this algo should consider
>> documents that have many words in common to be semantically close,
>> and
>
>> ones with few words in common to be semantically distant. This simple
>> method correlates surprisingly well with how a human being, looking
>> at
>
>> content, might classify a document collection. Although the algorithm
>> doesn't understand anything about what the words *mean*, the patterns
>> it notices can make it seem astonishingly intelligent.
>>
>> When you search an such an index, the search engine looks at
>> similarity values it has calculated for every content word, and
>> returns the documents that it thinks best fit the query. Because two
>> documents may be semantically very close even if they do not share a
>> particular keyword,
>>
>> Where a plain keyword search will fail if there is no exact match,
>> this algo will often return relevant documents that don't contain the
>> keyword at all.
>>
>> - Eswar
>>
>> On Nov 27, 2007 7:51 AM, Marvin Humphrey <ma...@rectangular.com>
> wrote:
>>
>>>
>>> On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
>>>
>>>> We essentially are looking at having an implementation for doing
>>>> search which can return documents having conceptually similar
>>>> words without necessarily having the original word searched for.
>>>
>>> Very challenging. Say someone searches for "LSA" and hits an
>>> archived
>>
>>> version of the mail you sent to this list. "LSA" is a reasonably
>>> discriminating term. But so is "Eswar".
>>>
>>> If you knew that the original term was "LSA", then you might look
>>> for documents near it in term vector space. But if you don't know
>>> the original term, only the content of the document, how do you know
>
>>> whether you should look for docs near "lsa" or "eswar"?
>>>
>>> Marvin Humphrey
>>> Rectangular Research
>>> http://www.rectangular.com/
>>>
>>>
>>>
>>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
RE: LSA Implementation
Posted by "Norskog, Lance" <la...@divvio.com>.
WordNet itself is English-only. There are various ontology projects for
it.
http://www.globalwordnet.org/ is a separate world language database
project. I found it at the bottom of the WordNet wikipedia page. Thanks
for starting me on the search!
Lance
-----Original Message-----
From: Eswar K [mailto:kja.eswar@gmail.com]
Sent: Monday, November 26, 2007 6:50 PM
To: solr-user@lucene.apache.org
Subject: Re: LSA Implementation
The languages also include CJK :) among others.
- Eswar
On Nov 27, 2007 8:16 AM, Norskog, Lance <la...@divvio.com> wrote:
> The WordNet project at Princeton (USA) is a large database of
synonyms.
> If you're only working in English this might be useful instead of
> running your own analyses.
>
> http://en.wikipedia.org/wiki/WordNet
> http://wordnet.princeton.edu/
>
> Lance
>
> -----Original Message-----
> From: Eswar K [mailto:kja.eswar@gmail.com]
> Sent: Monday, November 26, 2007 6:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LSA Implementation
>
> In addition to recording which keywords a document contains, the
> method examines the document collection as a whole, to see which other
> documents contain some of those same words. this algo should consider
> documents that have many words in common to be semantically close, and
> ones with few words in common to be semantically distant. This simple
> method correlates surprisingly well with how a human being, looking at
> content, might classify a document collection. Although the algorithm
> doesn't understand anything about what the words *mean*, the patterns
> it notices can make it seem astonishingly intelligent.
>
> When you search an such an index, the search engine looks at
> similarity values it has calculated for every content word, and
> returns the documents that it thinks best fit the query. Because two
> documents may be semantically very close even if they do not share a
> particular keyword,
>
> Where a plain keyword search will fail if there is no exact match,
> this algo will often return relevant documents that don't contain the
> keyword at all.
>
> - Eswar
>
> On Nov 27, 2007 7:51 AM, Marvin Humphrey <ma...@rectangular.com>
wrote:
>
> >
> > On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
> >
> > > We essentially are looking at having an implementation for doing
> > > search which can return documents having conceptually similar
> > > words without necessarily having the original word searched for.
> >
> > Very challenging. Say someone searches for "LSA" and hits an
> > archived
>
> > version of the mail you sent to this list. "LSA" is a reasonably
> > discriminating term. But so is "Eswar".
> >
> > If you knew that the original term was "LSA", then you might look
> > for documents near it in term vector space. But if you don't know
> > the original term, only the content of the document, how do you know
> > whether you should look for docs near "lsa" or "eswar"?
> >
> > Marvin Humphrey
> > Rectangular Research
> > http://www.rectangular.com/
> >
> >
> >
>
Re: LSA Implementation
Posted by Eswar K <kj...@gmail.com>.
The languages also include CJK :) among others.
- Eswar
On Nov 27, 2007 8:16 AM, Norskog, Lance <la...@divvio.com> wrote:
> The WordNet project at Princeton (USA) is a large database of synonyms.
> If you're only working in English this might be useful instead of
> running your own analyses.
>
> http://en.wikipedia.org/wiki/WordNet
> http://wordnet.princeton.edu/
>
> Lance
>
> -----Original Message-----
> From: Eswar K [mailto:kja.eswar@gmail.com]
> Sent: Monday, November 26, 2007 6:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: LSA Implementation
>
> In addition to recording which keywords a document contains, the method
> examines the document collection as a whole, to see which other
> documents contain some of those same words. this algo should consider
> documents that have many words in common to be semantically close, and
> ones with few words in common to be semantically distant. This simple
> method correlates surprisingly well with how a human being, looking at
> content, might classify a document collection. Although the algorithm
> doesn't understand anything about what the words *mean*, the patterns it
> notices can make it seem astonishingly intelligent.
>
> When you search an such an index, the search engine looks at similarity
> values it has calculated for every content word, and returns the
> documents that it thinks best fit the query. Because two documents may
> be semantically very close even if they do not share a particular
> keyword,
>
> Where a plain keyword search will fail if there is no exact match, this
> algo will often return relevant documents that don't contain the keyword
> at all.
>
> - Eswar
>
> On Nov 27, 2007 7:51 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
>
> >
> > On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
> >
> > > We essentially are looking at having an implementation for doing
> > > search which can return documents having conceptually similar words
> > > without necessarily having the original word searched for.
> >
> > Very challenging. Say someone searches for "LSA" and hits an archived
>
> > version of the mail you sent to this list. "LSA" is a reasonably
> > discriminating term. But so is "Eswar".
> >
> > If you knew that the original term was "LSA", then you might look for
> > documents near it in term vector space. But if you don't know the
> > original term, only the content of the document, how do you know
> > whether you should look for docs near "lsa" or "eswar"?
> >
> > Marvin Humphrey
> > Rectangular Research
> > http://www.rectangular.com/
> >
> >
> >
>
RE: LSA Implementation
Posted by "Norskog, Lance" <la...@divvio.com>.
The WordNet project at Princeton (USA) is a large database of synonyms.
If you're only working in English this might be useful instead of
running your own analyses.
http://en.wikipedia.org/wiki/WordNet
http://wordnet.princeton.edu/
Lance
-----Original Message-----
From: Eswar K [mailto:kja.eswar@gmail.com]
Sent: Monday, November 26, 2007 6:34 PM
To: solr-user@lucene.apache.org
Subject: Re: LSA Implementation
In addition to recording which keywords a document contains, the method
examines the document collection as a whole, to see which other
documents contain some of those same words. this algo should consider
documents that have many words in common to be semantically close, and
ones with few words in common to be semantically distant. This simple
method correlates surprisingly well with how a human being, looking at
content, might classify a document collection. Although the algorithm
doesn't understand anything about what the words *mean*, the patterns it
notices can make it seem astonishingly intelligent.
When you search an such an index, the search engine looks at similarity
values it has calculated for every content word, and returns the
documents that it thinks best fit the query. Because two documents may
be semantically very close even if they do not share a particular
keyword,
Where a plain keyword search will fail if there is no exact match, this
algo will often return relevant documents that don't contain the keyword
at all.
- Eswar
On Nov 27, 2007 7:51 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
>
> On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
>
> > We essentially are looking at having an implementation for doing
> > search which can return documents having conceptually similar words
> > without necessarily having the original word searched for.
>
> Very challenging. Say someone searches for "LSA" and hits an archived
> version of the mail you sent to this list. "LSA" is a reasonably
> discriminating term. But so is "Eswar".
>
> If you knew that the original term was "LSA", then you might look for
> documents near it in term vector space. But if you don't know the
> original term, only the content of the document, how do you know
> whether you should look for docs near "lsa" or "eswar"?
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
Re: LSA Implementation
Posted by Eswar K <kj...@gmail.com>.
In addition to recording which keywords a document contains, the method
examines the document collection as a whole, to see which other documents
contain some of those same words. this algo should consider documents that
have many words in common to be semantically close, and ones with few words
in common to be semantically distant. This simple method correlates
surprisingly well with how a human being, looking at content, might classify
a document collection. Although the algorithm doesn't understand anything
about what the words *mean*, the patterns it notices can make it seem
astonishingly intelligent.
When you search an such an index, the search engine looks at similarity
values it has calculated for every content word, and returns the documents
that it thinks best fit the query. Because two documents may be semantically
very close even if they do not share a particular keyword,
Where a plain keyword search will fail if there is no exact match, this algo
will often return relevant documents that don't contain the keyword at all.
- Eswar
On Nov 27, 2007 7:51 AM, Marvin Humphrey <ma...@rectangular.com> wrote:
>
> On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
>
> > We essentially are looking at having an implementation for doing
> > search
> > which can return documents having conceptually similar words without
> > necessarily having the original word searched for.
>
> Very challenging. Say someone searches for "LSA" and hits an
> archived version of the mail you sent to this list. "LSA" is a
> reasonably discriminating term. But so is "Eswar".
>
> If you knew that the original term was "LSA", then you might look for
> documents near it in term vector space. But if you don't know the
> original term, only the content of the document, how do you know
> whether you should look for docs near "lsa" or "eswar"?
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
Re: LSA Implementation
Posted by Marvin Humphrey <ma...@rectangular.com>.
On Nov 26, 2007, at 6:06 PM, Eswar K wrote:
> We essentially are looking at having an implementation for doing
> search
> which can return documents having conceptually similar words without
> necessarily having the original word searched for.
Very challenging. Say someone searches for "LSA" and hits an
archived version of the mail you sent to this list. "LSA" is a
reasonably discriminating term. But so is "Eswar".
If you knew that the original term was "LSA", then you might look for
documents near it in term vector space. But if you don't know the
original term, only the content of the document, how do you know
whether you should look for docs near "lsa" or "eswar"?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Re: LSA Implementation
Posted by Eswar K <kj...@gmail.com>.
We essentially are looking at having an implementation for doing search
which can return documents having conceptually similar words without
necessarily having the original word searched for.
- Eswar
On Nov 27, 2007 12:06 AM, Grant Ingersoll <gs...@apache.org> wrote:
> Interesting. I am not a lawyer, but my understanding has always been
> that this is not something we could do.
>
> The question has come up from time to time on the Lucene mailing list:
>
> http://www.gossamer-threads.com/lists/engine?list=lucene&do=search_results&search_forum=forum_3&search_string=Latent+Semantic&search_type=AND
>
> That being said, there may be other approaches that do similar things
> that aren't covered by a patent, I don't know.
>
> Is there something specific you want to do, or are you just going by
> the promise of better results using LSI?
>
> I suppose if someone said they had a patch for Lucene/Solr that
> implemented it, we could ask on legal-discuss for advice.
>
> -Grant
>
> On Nov 26, 2007, at 1:13 PM, Eswar K wrote:
>
> > I was just searching for info on LSA and came across Semantic Indexing
> > project under GNU license...which of couse is still under
> > development in C++
> > though.
> >
> > - Eswar
> >
> > On Nov 26, 2007 9:56 PM, Jack <jl...@gmail.com> wrote:
> >
> >> Interesting. Patents are valid for 20 years so it expires next
> >> year? :)
> >> PLSA does not seem to have been patented, at least not mentioned in
> >> http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis
> >>
> >> On Nov 26, 2007 6:58 AM, Grant Ingersoll <gs...@apache.org> wrote:
> >>> LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
> >>> patented, so it is not likely to happen unless the authors donate
> >>> the
> >>> patent to the ASF.
> >>>
> >>> -Grant
> >>>
> >>>
> >>>
> >>> On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
> >>>
> >>>> All,
> >>>>
> >>>> Is there any plan to implement Latent Semantic Analysis as part of
> >>>> Solr
> >>>> anytime in the near future?
> >>>>
> >>>> Regards,
> >>>> Eswar
> >>>
> >>> --------------------------
> >>> Grant Ingersoll
> >>> http://lucene.grantingersoll.com
> >>>
> >>> Lucene Helpful Hints:
> >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> >>> http://wiki.apache.org/lucene-java/LuceneFAQ
> >>>
> >>>
> >>>
> >>>
> >>
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
Re: LSA Implementation
Posted by Renaud Delbru <re...@deri.org>.
LDA (Latent Dirichlet Allocation) is a similar technique that extends pLSI.
You can find some implementation in C++ and Java on the Web.
Grant Ingersoll wrote:
> Interesting. I am not a lawyer, but my understanding has always been
> that this is not something we could do.
>
> The question has come up from time to time on the Lucene mailing list:
> http://www.gossamer-threads.com/lists/engine?list=lucene&do=search_results&search_forum=forum_3&search_string=Latent+Semantic&search_type=AND
>
>
> That being said, there may be other approaches that do similar things
> that aren't covered by a patent, I don't know.
>
> Is there something specific you want to do, or are you just going by
> the promise of better results using LSI?
>
> I suppose if someone said they had a patch for Lucene/Solr that
> implemented it, we could ask on legal-discuss for advice.
>
> -Grant
>
> On Nov 26, 2007, at 1:13 PM, Eswar K wrote:
>
>> I was just searching for info on LSA and came across Semantic Indexing
>> project under GNU license...which of couse is still under development
>> in C++
>> though.
>>
>> - Eswar
>>
>> On Nov 26, 2007 9:56 PM, Jack <jl...@gmail.com> wrote:
>>
>>> Interesting. Patents are valid for 20 years so it expires next year? :)
>>> PLSA does not seem to have been patented, at least not mentioned in
>>> http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis
>>>
>>> On Nov 26, 2007 6:58 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>>> LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
>>>> patented, so it is not likely to happen unless the authors donate the
>>>> patent to the ASF.
>>>>
>>>> -Grant
>>>>
>>>>
>>>>
>>>> On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
>>>>
>>>>> All,
>>>>>
>>>>> Is there any plan to implement Latent Semantic Analysis as part of
>>>>> Solr
>>>>> anytime in the near future?
>>>>>
>>>>> Regards,
>>>>> Eswar
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://lucene.grantingersoll.com
>>>>
>>>> Lucene Helpful Hints:
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>
>>>>
>>>>
>>>>
>>>
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
--
Renaud Delbru,
E.C.S., M.Sc. Student,
Semantic Information Systems and
Language Engineering Group (SmILE),
Digital Enterprise Research Institute,
National University of Ireland, Galway.
http://smile.deri.ie/
Re: LSA Implementation
Posted by Grant Ingersoll <gs...@apache.org>.
Interesting. I am not a lawyer, but my understanding has always been
that this is not something we could do.
The question has come up from time to time on the Lucene mailing list:
http://www.gossamer-threads.com/lists/engine?list=lucene&do=search_results&search_forum=forum_3&search_string=Latent+Semantic&search_type=AND
That being said, there may be other approaches that do similar things
that aren't covered by a patent, I don't know.
Is there something specific you want to do, or are you just going by
the promise of better results using LSI?
I suppose if someone said they had a patch for Lucene/Solr that
implemented it, we could ask on legal-discuss for advice.
-Grant
On Nov 26, 2007, at 1:13 PM, Eswar K wrote:
> I was just searching for info on LSA and came across Semantic Indexing
> project under GNU license...which of couse is still under
> development in C++
> though.
>
> - Eswar
>
> On Nov 26, 2007 9:56 PM, Jack <jl...@gmail.com> wrote:
>
>> Interesting. Patents are valid for 20 years so it expires next
>> year? :)
>> PLSA does not seem to have been patented, at least not mentioned in
>> http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis
>>
>> On Nov 26, 2007 6:58 AM, Grant Ingersoll <gs...@apache.org> wrote:
>>> LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
>>> patented, so it is not likely to happen unless the authors donate
>>> the
>>> patent to the ASF.
>>>
>>> -Grant
>>>
>>>
>>>
>>> On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
>>>
>>>> All,
>>>>
>>>> Is there any plan to implement Latent Semantic Analysis as part of
>>>> Solr
>>>> anytime in the near future?
>>>>
>>>> Regards,
>>>> Eswar
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://lucene.grantingersoll.com
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
Re: LSA Implementation
Posted by Eswar K <kj...@gmail.com>.
I was just searching for info on LSA and came across Semantic Indexing
project under GNU license...which of couse is still under development in C++
though.
- Eswar
On Nov 26, 2007 9:56 PM, Jack <jl...@gmail.com> wrote:
> Interesting. Patents are valid for 20 years so it expires next year? :)
> PLSA does not seem to have been patented, at least not mentioned in
> http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis
>
> On Nov 26, 2007 6:58 AM, Grant Ingersoll <gs...@apache.org> wrote:
> > LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
> > patented, so it is not likely to happen unless the authors donate the
> > patent to the ASF.
> >
> > -Grant
> >
> >
> >
> > On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
> >
> > > All,
> > >
> > > Is there any plan to implement Latent Semantic Analysis as part of
> > > Solr
> > > anytime in the near future?
> > >
> > > Regards,
> > > Eswar
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.grantingersoll.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
>
Re: LSA Implementation
Posted by Jack <jl...@gmail.com>.
Interesting. Patents are valid for 20 years so it expires next year? :)
PLSA does not seem to have been patented, at least not mentioned in
http://en.wikipedia.org/wiki/Probabilistic_latent_semantic_analysis
On Nov 26, 2007 6:58 AM, Grant Ingersoll <gs...@apache.org> wrote:
> LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
> patented, so it is not likely to happen unless the authors donate the
> patent to the ASF.
>
> -Grant
>
>
>
> On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
>
> > All,
> >
> > Is there any plan to implement Latent Semantic Analysis as part of
> > Solr
> > anytime in the near future?
> >
> > Regards,
> > Eswar
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
Re: LSA Implementation
Posted by Grant Ingersoll <gs...@apache.org>.
LSA (http://en.wikipedia.org/wiki/Latent_semantic_indexing) is
patented, so it is not likely to happen unless the authors donate the
patent to the ASF.
-Grant
On Nov 26, 2007, at 8:23 AM, Eswar K wrote:
> All,
>
> Is there any plan to implement Latent Semantic Analysis as part of
> Solr
> anytime in the near future?
>
> Regards,
> Eswar
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ