You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by nitin gopi <ni...@gmail.com> on 2009/03/18 07:09:01 UTC

lsi as indexing algorithm with lucene

hi all , has any body tried to use LSI(latent semantic indexing) for
indexing in lucene?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lsi as indexing algorithm with lucene

Posted by Marvin Humphrey <ma...@rectangular.com>.
On Wed, Mar 18, 2009 at 08:09:33AM +0100, Paul Libbrecht wrote:

> LSI is patented so it's not been a flurry of implementation attempts.

Hasn't the original patent expired?

http://mail.python.org/pipermail/python-list/2007-July/621547.html

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lsi as indexing algorithm with lucene

Posted by adasal <ad...@gmail.com>.
Hello all,
following the link to SemanticVectors - related research there is this
link:-
Magnus Sahlgren. An introduction to random
indexing.<http://www.sics.se/%7Emange/papers/RI_intro.pdf>
I would like to point out that Magnus Sahlgren has completed a PHd in this
area which is both very readable and very informative.
There are also other Java implementations of Random Indexing using sparse
arrays, one referenced from the SICS site, sometimes it is helpful to look
at alternatives as examples offered can be illuminating.
Adam Saltiel

2009/4/26 Dominik Jednoralski <tw...@googlemail.com>

> Hi,
>
> I'm the guy who has written the bachelor on this. Sorry it took a while to
> publish it to the community, but I had to improve it before publishing. The
> topic of the thesis was to augment the Lucene-driven search facility of the
> Intelligent Tutoring System ActiveMath by latent semantics. Semantic
> results
> came from the SemanticVectors software package by Dominic Widdows
>
> http://code.google.com/p/semanticvectors/
>
> and have been used to 'blow up' an index query in a way Mr Libbrecht, my
> supervisor, described above.
>
> I wonder if my blog is an appropriate location for my thesis, so please
> feel
> free to redistribute it.
>
>
> http://www.twelve02.de/publications/jednoralski_bachelorthesis_latent_semantics_for_activemath.pdf
>
>
> Best
>
> Dominik Jednoralski
>
>
> 2009/3/18 Simon Willnauer <si...@googlemail.com>
>
> > Hi,
> >
> > On Wed, Mar 18, 2009 at 8:59 AM, Paul Libbrecht <pa...@activemath.org>
> > wrote:
> > > Depending on your corpus, a semantic vector enabled search engine
> > definitely
> > > is more semantic than one without.
> > >
> > > The general approach I have with these is:
> > >
> > > - get a query
> > > - expand each terms of the query with the fuzzification of
> > semantic-vectors
> > > (e.g. if requested for termA, add termB and termC with their
> > > semantic-distance as a boost factor)
> > > - run query get results with higher rank for termA if found, then for
> > termB
> > > and termC
> > >
> > > My student Dominik Jednoralski has written a bachelor thesis on that.
> > > I'll forward the request to send you this.
> > If it is possible could you post a link where everybody can reach the
> > thesis of your student?
> > I guess it could be interesting for a couple of people on this list
> > and a benefit for your student as well.
> >
> > simon
> > >
> > > Join the semanticVectors' list where the original author also talks.
> > >
> > > paul
> > >
> > >
> > > Le 18-mars-09 à 08:34, nitin gopi a écrit :
> > >
> > >> hi Paul, I am new to this field of search engine. My aim is to develop
> > >> a semantic search engine. Initially  I was trying to develop that by
> > >> using LSI. But since it is patented that is why there are no many
> > >> implementation attempts. I want  to ask is it possible to create a
> > >> search engine using lucene and semantic vector which is semantically
> > >> better than lucene?
> > >>
> > >> On 3/18/09, Paul Libbrecht <pa...@activemath.org> wrote:
> > >>>
> > >>> Nitin,
> > >>>
> > >>> LSI is patented so it's not been a flurry of implementation attempts.
> > >>> However, SemanticVectors is a library that does similar approaches to
> > >>> LSA/LSI for indexing and is based on Lucene's term-vectors.
> > >>>
> > >>> paul
> > >>>
> > >>>
> > >>> Le 18-mars-09 à 07:09, nitin gopi a écrit :
> > >>>
> > >>>> hi all , has any body tried to use LSI(latent semantic indexing) for
> > >>>> indexing in lucene?
> > >>>
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: lsi as indexing algorithm with lucene

Posted by Dominik Jednoralski <tw...@googlemail.com>.
Hi,

I'm the guy who has written the bachelor on this. Sorry it took a while to
publish it to the community, but I had to improve it before publishing. The
topic of the thesis was to augment the Lucene-driven search facility of the
Intelligent Tutoring System ActiveMath by latent semantics. Semantic results
came from the SemanticVectors software package by Dominic Widdows

http://code.google.com/p/semanticvectors/

and have been used to 'blow up' an index query in a way Mr Libbrecht, my
supervisor, described above.

I wonder if my blog is an appropriate location for my thesis, so please feel
free to redistribute it.

http://www.twelve02.de/publications/jednoralski_bachelorthesis_latent_semantics_for_activemath.pdf


Best

Dominik Jednoralski


2009/3/18 Simon Willnauer <si...@googlemail.com>

> Hi,
>
> On Wed, Mar 18, 2009 at 8:59 AM, Paul Libbrecht <pa...@activemath.org>
> wrote:
> > Depending on your corpus, a semantic vector enabled search engine
> definitely
> > is more semantic than one without.
> >
> > The general approach I have with these is:
> >
> > - get a query
> > - expand each terms of the query with the fuzzification of
> semantic-vectors
> > (e.g. if requested for termA, add termB and termC with their
> > semantic-distance as a boost factor)
> > - run query get results with higher rank for termA if found, then for
> termB
> > and termC
> >
> > My student Dominik Jednoralski has written a bachelor thesis on that.
> > I'll forward the request to send you this.
> If it is possible could you post a link where everybody can reach the
> thesis of your student?
> I guess it could be interesting for a couple of people on this list
> and a benefit for your student as well.
>
> simon
> >
> > Join the semanticVectors' list where the original author also talks.
> >
> > paul
> >
> >
> > Le 18-mars-09 à 08:34, nitin gopi a écrit :
> >
> >> hi Paul, I am new to this field of search engine. My aim is to develop
> >> a semantic search engine. Initially  I was trying to develop that by
> >> using LSI. But since it is patented that is why there are no many
> >> implementation attempts. I want  to ask is it possible to create a
> >> search engine using lucene and semantic vector which is semantically
> >> better than lucene?
> >>
> >> On 3/18/09, Paul Libbrecht <pa...@activemath.org> wrote:
> >>>
> >>> Nitin,
> >>>
> >>> LSI is patented so it's not been a flurry of implementation attempts.
> >>> However, SemanticVectors is a library that does similar approaches to
> >>> LSA/LSI for indexing and is based on Lucene's term-vectors.
> >>>
> >>> paul
> >>>
> >>>
> >>> Le 18-mars-09 à 07:09, nitin gopi a écrit :
> >>>
> >>>> hi all , has any body tried to use LSI(latent semantic indexing) for
> >>>> indexing in lucene?
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: lsi as indexing algorithm with lucene

Posted by Simon Willnauer <si...@googlemail.com>.
Hi,

On Wed, Mar 18, 2009 at 8:59 AM, Paul Libbrecht <pa...@activemath.org> wrote:
> Depending on your corpus, a semantic vector enabled search engine definitely
> is more semantic than one without.
>
> The general approach I have with these is:
>
> - get a query
> - expand each terms of the query with the fuzzification of semantic-vectors
> (e.g. if requested for termA, add termB and termC with their
> semantic-distance as a boost factor)
> - run query get results with higher rank for termA if found, then for termB
> and termC
>
> My student Dominik Jednoralski has written a bachelor thesis on that.
> I'll forward the request to send you this.
If it is possible could you post a link where everybody can reach the
thesis of your student?
I guess it could be interesting for a couple of people on this list
and a benefit for your student as well.

simon
>
> Join the semanticVectors' list where the original author also talks.
>
> paul
>
>
> Le 18-mars-09 à 08:34, nitin gopi a écrit :
>
>> hi Paul, I am new to this field of search engine. My aim is to develop
>> a semantic search engine. Initially  I was trying to develop that by
>> using LSI. But since it is patented that is why there are no many
>> implementation attempts. I want  to ask is it possible to create a
>> search engine using lucene and semantic vector which is semantically
>> better than lucene?
>>
>> On 3/18/09, Paul Libbrecht <pa...@activemath.org> wrote:
>>>
>>> Nitin,
>>>
>>> LSI is patented so it's not been a flurry of implementation attempts.
>>> However, SemanticVectors is a library that does similar approaches to
>>> LSA/LSI for indexing and is based on Lucene's term-vectors.
>>>
>>> paul
>>>
>>>
>>> Le 18-mars-09 à 07:09, nitin gopi a écrit :
>>>
>>>> hi all , has any body tried to use LSI(latent semantic indexing) for
>>>> indexing in lucene?
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lsi as indexing algorithm with lucene

Posted by Paul Libbrecht <pa...@activemath.org>.
Depending on your corpus, a semantic vector enabled search engine  
definitely is more semantic than one without.

The general approach I have with these is:

- get a query
- expand each terms of the query with the fuzzification of semantic- 
vectors (e.g. if requested for termA, add termB and termC with their  
semantic-distance as a boost factor)
- run query get results with higher rank for termA if found, then for  
termB and termC

My student Dominik Jednoralski has written a bachelor thesis on that.
I'll forward the request to send you this.

Join the semanticVectors' list where the original author also talks.

paul


Le 18-mars-09 à 08:34, nitin gopi a écrit :

> hi Paul, I am new to this field of search engine. My aim is to develop
> a semantic search engine. Initially  I was trying to develop that by
> using LSI. But since it is patented that is why there are no many
> implementation attempts. I want  to ask is it possible to create a
> search engine using lucene and semantic vector which is semantically
> better than lucene?
>
> On 3/18/09, Paul Libbrecht <pa...@activemath.org> wrote:
>> Nitin,
>>
>> LSI is patented so it's not been a flurry of implementation attempts.
>> However, SemanticVectors is a library that does similar approaches to
>> LSA/LSI for indexing and is based on Lucene's term-vectors.
>>
>> paul
>>
>>
>> Le 18-mars-09 à 07:09, nitin gopi a écrit :
>>
>>> hi all , has any body tried to use LSI(latent semantic indexing) for
>>> indexing in lucene?
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


Re: lsi as indexing algorithm with lucene

Posted by nitin gopi <ni...@gmail.com>.
hi Paul, I am new to this field of search engine. My aim is to develop
a semantic search engine. Initially  I was trying to develop that by
using LSI. But since it is patented that is why there are no many
implementation attempts. I want  to ask is it possible to create a
search engine using lucene and semantic vector which is semantically
better than lucene?

On 3/18/09, Paul Libbrecht <pa...@activemath.org> wrote:
> Nitin,
>
> LSI is patented so it's not been a flurry of implementation attempts.
> However, SemanticVectors is a library that does similar approaches to
> LSA/LSI for indexing and is based on Lucene's term-vectors.
>
> paul
>
>
> Le 18-mars-09 à 07:09, nitin gopi a écrit :
>
>> hi all , has any body tried to use LSI(latent semantic indexing) for
>> indexing in lucene?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lsi as indexing algorithm with lucene

Posted by Paul Libbrecht <pa...@activemath.org>.
Nitin,

LSI is patented so it's not been a flurry of implementation attempts.
However, SemanticVectors is a library that does similar approaches to  
LSA/LSI for indexing and is based on Lucene's term-vectors.

paul


Le 18-mars-09 à 07:09, nitin gopi a écrit :

> hi all , has any body tried to use LSI(latent semantic indexing) for
> indexing in lucene?