You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by nitin gopi <ni...@gmail.com> on 2009/05/01 19:17:02 UTC

problem with the output of using SVD with lucene

Hi all,
I had implemented something, which I am going to describe in following steps
1. I took the input as 2 text files
2. I removed stop words from them
3. I did stemming over them
4. I formed the term document matrix using lucene. In the matrix values were
the number of times the term has appeared in the document.
5. I calculated the term frequency and the inverse document frequency. I
then multiplied them and formed the weight of each term.
6. I then calculated the SVD of the resultant matrix.As a result I got 3
matrices U(term vector matrix), S(singular values) and V(right singular
values)
                        My question is how are the 3 matrices formed in step
6 are going to be useful for me. How they prove that they have solved the
problem of synonemy and polysemy.Last time you have mentioned that if we
take only the first k values of the matrix then it proves to be useful. But
how?? Please reply as soon as possible.

Thanking You,
Yours Sincerely
Nitin Gopi