You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by hesh jay <he...@gmail.com> on 2015/04/01 04:44:46 UTC

for check similarity of two sentences

hi,
I am second year undergraduate of University of Moratuwa,SriLanka.My second
year project I am doing Question answering system(Knowledge base).In this
project i have to suggest similar question perviously asked by other users.
I should find similarity of two Sentences in my application to suggest
those questions.can i do that using Apache Lucene?
Thank You!
regards,
Heshan jayasinghe

Re: for check similarity of two sentences

Posted by Robust Links <pe...@robustlinks.com>.
Hi Heshan

one approach could be something like this:

1- vectorize each ngram of each sentence. One vectorization strategy is to
use word2vec (the deep learning package). i believe someone has ported
word2vec (originally in C) to Lucene. do google search
2- aggregate each word vector (i.e some clustering). this will aggregate N
vectors (of each ngram) to 1 vector for the whole sentence
3- given a new sentence to match, first do steps 1 and 2 above then compare
that vector with other sentence vectors. Comparison is usually something
like dot product, or more generally some kernel function. Use Apache math's
vectorized modules rather than writing loops since computing dot products
can be expensive and numerical packages use vectorization strategy (linear
algebra) to speed up the computation.

good luck with your project

Peyman

On Thu, Apr 2, 2015 at 4:04 AM, Gimantha Bandara <gi...@wso2.com> wrote:

> Hi Heshan,
> I think you can achieve what you are looking for. You may read "lucene in
> Action 2nd edition" about lucene scoring system and FuzzyQuery. Hope this
> may help. May be someone can suggest much better approach.
>
> On Wed, Apr 1, 2015 at 8:14 AM, hesh jay <he...@gmail.com> wrote:
>
> > hi,
> > I am second year undergraduate of University of Moratuwa,SriLanka.My
> second
> > year project I am doing Question answering system(Knowledge base).In this
> > project i have to suggest similar question perviously asked by other
> users.
> > I should find similarity of two Sentences in my application to suggest
> > those questions.can i do that using Apache Lucene?
> > Thank You!
> > regards,
> > Heshan jayasinghe
> >
>
>
>
> --
> Gimantha Bandara
> Software Engineer
> WSO2. Inc : http://wso2.com
> Mobile : +94714961919
>

Re: for check similarity of two sentences

Posted by Gimantha Bandara <gi...@wso2.com>.
Hi Heshan,
I think you can achieve what you are looking for. You may read "lucene in
Action 2nd edition" about lucene scoring system and FuzzyQuery. Hope this
may help. May be someone can suggest much better approach.

On Wed, Apr 1, 2015 at 8:14 AM, hesh jay <he...@gmail.com> wrote:

> hi,
> I am second year undergraduate of University of Moratuwa,SriLanka.My second
> year project I am doing Question answering system(Knowledge base).In this
> project i have to suggest similar question perviously asked by other users.
> I should find similarity of two Sentences in my application to suggest
> those questions.can i do that using Apache Lucene?
> Thank You!
> regards,
> Heshan jayasinghe
>



-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919