You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Christoph Pächter <Pa...@htwg-konstanz.de> on 2007/01/29 16:46:14 UTC

LSI, Latent Semantic Indexing

Is there any work/project to include LSI in Lucene?
There are some questions in the mailing lists, but they are older than a year.
Something happened since then?

Cheers,
Christoph

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: LSI, Latent Semantic Indexing

Posted by Christoph Pächter <Pa...@htwg-konstanz.de>.
Thank you for your answers.

I have no particular need.
I just thought that there is the fuzzysearch for syntactical proximity search,
there may be some implementation (in Lucene or external, easy to include) for a
semantical proximity search (LSI or something similar).
Just wanted to test it on some pdf's..

Quoting "J. Delgado" <jo...@gmail.com>:

 It all depends for what you need it for. BTW, Latent Semantic Analysis
 (LSA) is a super set of LSI. LSI concentrates on just how to index and
 search documents in a reduced dimensional (latent) space whether LSA
 includes a range of possible analysis that can be done on
 representations in this space. There are other equivalent techniques
 (e.g. probabilistic LSI) that can be much more efficient.
 
 Perhaps the original requester could give us more information about
 how he intends to use LSI. For example is this for plain "concept"
 search or for document classification, clustering, automatic query
 expansion/suggestion, link/topology analysis or for something else?
 
 J.D.
 
 
 
 2007/1/29, Mario Alejandro M. <ma...@gmail.com>:
 > I also research the use of LSA.
 >
 > My interest is simply cluster the information. I found that LSA is a way,
 > but I'm not convinved is the better (also, is very high in CPU and RAM
 > consumption).
 >
 > --
 > Mario Alejandro Montoya
 > MCP
 > www.paradondevamos.com
 > !El mejor sitio de restaurantes y entretenimiento de Colombia!
 >
 >
 
 ---------------------------------------------------------------------
 To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
 For additional commands, e-mail: java-dev-help@lucene.apache.org
 


Mit freundlichen Grüßen
Kind regards
Christoph Pächter

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: LSI, Latent Semantic Indexing

Posted by "J. Delgado" <jo...@gmail.com>.
It all depends for what you need it for. BTW, Latent Semantic Analysis
(LSA) is a super set of LSI. LSI concentrates on just how to index and
search documents in a reduced dimensional (latent) space whether LSA
includes a range of possible analysis that can be done on
representations in this space. There are other equivalent techniques
(e.g. probabilistic LSI) that can be much more efficient.

Perhaps the original requester could give us more information about
how he intends to use LSI. For example is this for plain "concept"
search or for document classification, clustering, automatic query
expansion/suggestion, link/topology analysis or for something else?

J.D.



2007/1/29, Mario Alejandro M. <ma...@gmail.com>:
> I also research the use of LSA.
>
> My interest is simply cluster the information. I found that LSA is a way,
> but I'm not convinved is the better (also, is very high in CPU and RAM
> consumption).
>
> --
> Mario Alejandro Montoya
> MCP
> www.paradondevamos.com
> !El mejor sitio de restaurantes y entretenimiento de Colombia!
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: LSI, Latent Semantic Indexing

Posted by "Mario Alejandro M." <ma...@gmail.com>.
I also research the use of LSA.

My interest is simply cluster the information. I found that LSA is a way,
but I'm not convinved is the better (also, is very high in CPU and RAM
consumption).

-- 
Mario Alejandro Montoya
MCP
www.paradondevamos.com
!El mejor sitio de restaurantes y entretenimiento de Colombia!

Re: LSI, Latent Semantic Indexing

Posted by karl wettin <ka...@gmail.com>.
29 jan 2007 kl. 16.46 skrev Christoph Pächter:

> Is there any work/project to include LSI in Lucene?
> There are some questions in the mailing lists, but they are older  
> than a year.
> Something happened since then?

As far as I know, no. But there are many other projects that does it  
for you. Carrot search use a number of algorithms (some source is  
open, some is proprietary) that cluster things up live based on td- 
idf calculated from the results. Weka also have some algorithms that  
can be used. They are however not very optimized for text mining. I  
have seen some references to sparse matrix implementations, but I  
don't think it is an official part of the distribution.

I know nothing about how you plan to use it, but looking from the  
perspective of the applications I use Lucene for, it is not that  
often a corpus contains all the data people are searching for, or as  
the data is created by the users that don't know the correct terms to  
describe the information, there are associations between documents  
that are not detectable by analyzing terms. There for I think that  
finding associations in the content is not as interesting as finding  
associations by analyzing session behaviour. I would use LSI as  
something secondary on top of analyzed behaviour. Know what I mean?



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org