You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Wen Gao <sa...@gmail.com> on 2011/02/16 19:55:05 UTC

how can I get the similarity in fuzzy query

Hi,
I am using FuzzyQuery to get fuzzy mathed results. I want to get the
similarity in percent for every matched record.
for example, if i search for "databasd", and it will return results such as
"database", "database1", and "database11". I want to get the similarity in
percent for evey record, such as 87.5%, 75%, and 62.5%.

How can I do this?

Any ideas?

Wen Gao

RE: how can I get the similarity in fuzzy query

Posted by Digy <di...@gmail.com>.
http://wiki.apache.org/lucene-java/ScoresAsPercentages

DIGY

-----Original Message-----
From: Wen Gao [mailto:samuel.gaowen@gmail.com] 
Sent: Wednesday, February 16, 2011 8:55 PM
To: lucene-net-dev@lucene.apache.org
Subject: how can I get the similarity in fuzzy query

Hi,
I am using FuzzyQuery to get fuzzy mathed results. I want to get the
similarity in percent for every matched record.
for example, if i search for "databasd", and it will return results such as
"database", "database1", and "database11". I want to get the similarity in
percent for evey record, such as 87.5%, 75%, and 62.5%.

How can I do this?

Any ideas?

Wen Gao


RE: how can I get the similarity in fuzzy query

Posted by Digy <di...@gmail.com>.
Whether *fuzzy* or not, all queries are simple term queries at the end and
Lucene does not have an info like *similarity*, just scores.

DIGY

-----Original Message-----
From: Wen Gao [mailto:samuel.gaowen@gmail.com] 
Sent: Wednesday, February 16, 2011 9:47 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: how can I get the similarity in fuzzy query

Hi,
I think my situation is just to compare the similarity of strings: I want to
calculate the similarity between the typed results and the returned results
using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f,
what i want to do is get the "similariy" instead of "score" for every result
that returns.

Thanks for your time.

Wen

2011/2/16 Christopher Currens <cu...@gmail.com>

> I was going to post the link that Digy posted, which suggests not to
> determine a match that way.  If my understanding is correct, the scores
> returned for a query are relative to which documents were retrieved by the
> search, in that if a document is deleted from the index, the scores will
> change even though the query did not, because the number of returned
> documents are different.
>
> If the only thing you wanted to do was to calculate how a resulting string
> was to a search string, I suggest the Levenshtein Distance algorithm
> http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem
> like
> that's quite what you want to accomplish based on your question.
>
> Christopher
>
> On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com> wrote:
>
> > Hi,
> > I am using FuzzyQuery to get fuzzy mathed results. I want to get the
> > similarity in percent for every matched record.
> > for example, if i search for "databasd", and it will return results such
> as
> > "database", "database1", and "database11". I want to get the similarity
> in
> > percent for evey record, such as 87.5%, 75%, and 62.5%.
> >
> > How can I do this?
> >
> > Any ideas?
> >
> > Wen Gao
> >
>


Re: how can I get the similarity in fuzzy query

Posted by Wyatt Barnett <wy...@gmail.com>.
If you are running in VS 2010, I'd advise saving yourself some trouble
and just grabbing the 2.9.2 package off nuget.

On Wed, Feb 16, 2011 at 3:13 PM, Wen Gao <sa...@gmail.com> wrote:
> Thanks you.
>
> Wen
> 2011/2/16 Digy <di...@gmail.com>
>
>> Download the source from
>> https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2
>> using a svn client(like TortoiseSVN), and open the project file with
>> VS20XX.
>>
>> DIGY
>>
>> -----Original Message-----
>> From: Wen Gao [mailto:samuel.gaowen@gmail.com]
>> Sent: Wednesday, February 16, 2011 9:58 PM
>> To: lucene-net-dev@lucene.apache.org
>> Subject: Re: how can I get the similarity in fuzzy query
>>
>>  OK. i get it. how can I recompile a Lucene_src on Windows?
>>
>> Thanks.
>> Wen
>> 2011/2/16 Christopher Currens <cu...@gmail.com>
>>
>> > As far as i know, you'll need to calculate that manually.  FuzzyQuery
>> > searches don't return any results like that.
>> >
>> > On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao <sa...@gmail.com>
>> wrote:
>> >
>> > > Hi,
>> > > I think my situation is just to compare the similarity of strings: I
>> want
>> > > to
>> > > calculate the similarity between the typed results and the returned
>> > results
>> > > using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
>> > 0.5f,
>> > > what i want to do is get the "similariy" instead of "score" for every
>> > > result
>> > > that returns.
>> > >
>> > > Thanks for your time.
>> > >
>> > > Wen
>> > >
>> > > 2011/2/16 Christopher Currens <cu...@gmail.com>
>> > >
>> > > > I was going to post the link that Digy posted, which suggests not to
>> > > > determine a match that way.  If my understanding is correct, the
>> scores
>> > > > returned for a query are relative to which documents were retrieved
>> by
>> > > the
>> > > > search, in that if a document is deleted from the index, the scores
>> > will
>> > > > change even though the query did not, because the number of returned
>> > > > documents are different.
>> > > >
>> > > > If the only thing you wanted to do was to calculate how a resulting
>> > > string
>> > > > was to a search string, I suggest the Levenshtein Distance algorithm
>> > > > http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
>> > seem
>> > > > like
>> > > > that's quite what you want to accomplish based on your question.
>> > > >
>> > > > Christopher
>> > > >
>> > > > On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > Hi,
>> > > > > I am using FuzzyQuery to get fuzzy mathed results. I want to get
>> the
>> > > > > similarity in percent for every matched record.
>> > > > > for example, if i search for "databasd", and it will return results
>> > > such
>> > > > as
>> > > > > "database", "database1", and "database11". I want to get the
>> > similarity
>> > > > in
>> > > > > percent for evey record, such as 87.5%, 75%, and 62.5%.
>> > > > >
>> > > > > How can I do this?
>> > > > >
>> > > > > Any ideas?
>> > > > >
>> > > > > Wen Gao
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>

Re: how can I get the similarity in fuzzy query

Posted by Wen Gao <sa...@gmail.com>.
Thanks you.

Wen
2011/2/16 Digy <di...@gmail.com>

> Download the source from
> https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2
> using a svn client(like TortoiseSVN), and open the project file with
> VS20XX.
>
> DIGY
>
> -----Original Message-----
> From: Wen Gao [mailto:samuel.gaowen@gmail.com]
> Sent: Wednesday, February 16, 2011 9:58 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: Re: how can I get the similarity in fuzzy query
>
>  OK. i get it. how can I recompile a Lucene_src on Windows?
>
> Thanks.
> Wen
> 2011/2/16 Christopher Currens <cu...@gmail.com>
>
> > As far as i know, you'll need to calculate that manually.  FuzzyQuery
> > searches don't return any results like that.
> >
> > On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao <sa...@gmail.com>
> wrote:
> >
> > > Hi,
> > > I think my situation is just to compare the similarity of strings: I
> want
> > > to
> > > calculate the similarity between the typed results and the returned
> > results
> > > using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
> > 0.5f,
> > > what i want to do is get the "similariy" instead of "score" for every
> > > result
> > > that returns.
> > >
> > > Thanks for your time.
> > >
> > > Wen
> > >
> > > 2011/2/16 Christopher Currens <cu...@gmail.com>
> > >
> > > > I was going to post the link that Digy posted, which suggests not to
> > > > determine a match that way.  If my understanding is correct, the
> scores
> > > > returned for a query are relative to which documents were retrieved
> by
> > > the
> > > > search, in that if a document is deleted from the index, the scores
> > will
> > > > change even though the query did not, because the number of returned
> > > > documents are different.
> > > >
> > > > If the only thing you wanted to do was to calculate how a resulting
> > > string
> > > > was to a search string, I suggest the Levenshtein Distance algorithm
> > > > http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
> > seem
> > > > like
> > > > that's quite what you want to accomplish based on your question.
> > > >
> > > > Christopher
> > > >
> > > > On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > > I am using FuzzyQuery to get fuzzy mathed results. I want to get
> the
> > > > > similarity in percent for every matched record.
> > > > > for example, if i search for "databasd", and it will return results
> > > such
> > > > as
> > > > > "database", "database1", and "database11". I want to get the
> > similarity
> > > > in
> > > > > percent for evey record, such as 87.5%, 75%, and 62.5%.
> > > > >
> > > > > How can I do this?
> > > > >
> > > > > Any ideas?
> > > > >
> > > > > Wen Gao
> > > > >
> > > >
> > >
> >
>
>

RE: how can I get the similarity in fuzzy query

Posted by Digy <di...@gmail.com>.
Download the source from
https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_9_2
using a svn client(like TortoiseSVN), and open the project file with VS20XX.

DIGY

-----Original Message-----
From: Wen Gao [mailto:samuel.gaowen@gmail.com] 
Sent: Wednesday, February 16, 2011 9:58 PM
To: lucene-net-dev@lucene.apache.org
Subject: Re: how can I get the similarity in fuzzy query

OK. i get it. how can I recompile a Lucene_src on Windows?

Thanks.
Wen
2011/2/16 Christopher Currens <cu...@gmail.com>

> As far as i know, you'll need to calculate that manually.  FuzzyQuery
> searches don't return any results like that.
>
> On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao <sa...@gmail.com> wrote:
>
> > Hi,
> > I think my situation is just to compare the similarity of strings: I
want
> > to
> > calculate the similarity between the typed results and the returned
> results
> > using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
> 0.5f,
> > what i want to do is get the "similariy" instead of "score" for every
> > result
> > that returns.
> >
> > Thanks for your time.
> >
> > Wen
> >
> > 2011/2/16 Christopher Currens <cu...@gmail.com>
> >
> > > I was going to post the link that Digy posted, which suggests not to
> > > determine a match that way.  If my understanding is correct, the
scores
> > > returned for a query are relative to which documents were retrieved by
> > the
> > > search, in that if a document is deleted from the index, the scores
> will
> > > change even though the query did not, because the number of returned
> > > documents are different.
> > >
> > > If the only thing you wanted to do was to calculate how a resulting
> > string
> > > was to a search string, I suggest the Levenshtein Distance algorithm
> > > http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
> seem
> > > like
> > > that's quite what you want to accomplish based on your question.
> > >
> > > Christopher
> > >
> > > On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > > I am using FuzzyQuery to get fuzzy mathed results. I want to get the
> > > > similarity in percent for every matched record.
> > > > for example, if i search for "databasd", and it will return results
> > such
> > > as
> > > > "database", "database1", and "database11". I want to get the
> similarity
> > > in
> > > > percent for evey record, such as 87.5%, 75%, and 62.5%.
> > > >
> > > > How can I do this?
> > > >
> > > > Any ideas?
> > > >
> > > > Wen Gao
> > > >
> > >
> >
>


Re: how can I get the similarity in fuzzy query

Posted by Wen Gao <sa...@gmail.com>.
OK. i get it. how can I recompile a Lucene_src on Windows?

Thanks.
Wen
2011/2/16 Christopher Currens <cu...@gmail.com>

> As far as i know, you'll need to calculate that manually.  FuzzyQuery
> searches don't return any results like that.
>
> On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao <sa...@gmail.com> wrote:
>
> > Hi,
> > I think my situation is just to compare the similarity of strings: I want
> > to
> > calculate the similarity between the typed results and the returned
> results
> > using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as
> 0.5f,
> > what i want to do is get the "similariy" instead of "score" for every
> > result
> > that returns.
> >
> > Thanks for your time.
> >
> > Wen
> >
> > 2011/2/16 Christopher Currens <cu...@gmail.com>
> >
> > > I was going to post the link that Digy posted, which suggests not to
> > > determine a match that way.  If my understanding is correct, the scores
> > > returned for a query are relative to which documents were retrieved by
> > the
> > > search, in that if a document is deleted from the index, the scores
> will
> > > change even though the query did not, because the number of returned
> > > documents are different.
> > >
> > > If the only thing you wanted to do was to calculate how a resulting
> > string
> > > was to a search string, I suggest the Levenshtein Distance algorithm
> > > http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't
> seem
> > > like
> > > that's quite what you want to accomplish based on your question.
> > >
> > > Christopher
> > >
> > > On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > > I am using FuzzyQuery to get fuzzy mathed results. I want to get the
> > > > similarity in percent for every matched record.
> > > > for example, if i search for "databasd", and it will return results
> > such
> > > as
> > > > "database", "database1", and "database11". I want to get the
> similarity
> > > in
> > > > percent for evey record, such as 87.5%, 75%, and 62.5%.
> > > >
> > > > How can I do this?
> > > >
> > > > Any ideas?
> > > >
> > > > Wen Gao
> > > >
> > >
> >
>

Re: how can I get the similarity in fuzzy query

Posted by Christopher Currens <cu...@gmail.com>.
As far as i know, you'll need to calculate that manually.  FuzzyQuery
searches don't return any results like that.

On Wed, Feb 16, 2011 at 11:47 AM, Wen Gao <sa...@gmail.com> wrote:

> Hi,
> I think my situation is just to compare the similarity of strings: I want
> to
> calculate the similarity between the typed results and the returned results
> using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f,
> what i want to do is get the "similariy" instead of "score" for every
> result
> that returns.
>
> Thanks for your time.
>
> Wen
>
> 2011/2/16 Christopher Currens <cu...@gmail.com>
>
> > I was going to post the link that Digy posted, which suggests not to
> > determine a match that way.  If my understanding is correct, the scores
> > returned for a query are relative to which documents were retrieved by
> the
> > search, in that if a document is deleted from the index, the scores will
> > change even though the query did not, because the number of returned
> > documents are different.
> >
> > If the only thing you wanted to do was to calculate how a resulting
> string
> > was to a search string, I suggest the Levenshtein Distance algorithm
> > http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem
> > like
> > that's quite what you want to accomplish based on your question.
> >
> > Christopher
> >
> > On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com>
> wrote:
> >
> > > Hi,
> > > I am using FuzzyQuery to get fuzzy mathed results. I want to get the
> > > similarity in percent for every matched record.
> > > for example, if i search for "databasd", and it will return results
> such
> > as
> > > "database", "database1", and "database11". I want to get the similarity
> > in
> > > percent for evey record, such as 87.5%, 75%, and 62.5%.
> > >
> > > How can I do this?
> > >
> > > Any ideas?
> > >
> > > Wen Gao
> > >
> >
>

Re: how can I get the similarity in fuzzy query

Posted by Wen Gao <sa...@gmail.com>.
Hi,
I think my situation is just to compare the similarity of strings: I want to
calculate the similarity between the typed results and the returned results
using *FuzzyQuery*. I have set the minimumSimilarity of FuzzyQuery as 0.5f,
what i want to do is get the "similariy" instead of "score" for every result
that returns.

Thanks for your time.

Wen

2011/2/16 Christopher Currens <cu...@gmail.com>

> I was going to post the link that Digy posted, which suggests not to
> determine a match that way.  If my understanding is correct, the scores
> returned for a query are relative to which documents were retrieved by the
> search, in that if a document is deleted from the index, the scores will
> change even though the query did not, because the number of returned
> documents are different.
>
> If the only thing you wanted to do was to calculate how a resulting string
> was to a search string, I suggest the Levenshtein Distance algorithm
> http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem
> like
> that's quite what you want to accomplish based on your question.
>
> Christopher
>
> On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com> wrote:
>
> > Hi,
> > I am using FuzzyQuery to get fuzzy mathed results. I want to get the
> > similarity in percent for every matched record.
> > for example, if i search for "databasd", and it will return results such
> as
> > "database", "database1", and "database11". I want to get the similarity
> in
> > percent for evey record, such as 87.5%, 75%, and 62.5%.
> >
> > How can I do this?
> >
> > Any ideas?
> >
> > Wen Gao
> >
>

Re: how can I get the similarity in fuzzy query

Posted by Christopher Currens <cu...@gmail.com>.
I was going to post the link that Digy posted, which suggests not to
determine a match that way.  If my understanding is correct, the scores
returned for a query are relative to which documents were retrieved by the
search, in that if a document is deleted from the index, the scores will
change even though the query did not, because the number of returned
documents are different.

If the only thing you wanted to do was to calculate how a resulting string
was to a search string, I suggest the Levenshtein Distance algorithm
http://en.wikipedia.org/wiki/Levenshtein_distance...but it doesn't seem like
that's quite what you want to accomplish based on your question.

Christopher

On Wed, Feb 16, 2011 at 10:55 AM, Wen Gao <sa...@gmail.com> wrote:

> Hi,
> I am using FuzzyQuery to get fuzzy mathed results. I want to get the
> similarity in percent for every matched record.
> for example, if i search for "databasd", and it will return results such as
> "database", "database1", and "database11". I want to get the similarity in
> percent for evey record, such as 87.5%, 75%, and 62.5%.
>
> How can I do this?
>
> Any ideas?
>
> Wen Gao
>