You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Patrick Diviacco <pa...@gmail.com> on 2011/03/03 16:25:11 UTC

Lucene nightly build: similarity score per field

I've downloaded Lucene nightly build because I need to customize the
similarity *per field*.

However I don't see the field parameter passed to the methods to compute the
score such as "tf" and "idf"...

how can I implement different similarities score per document field then ?

thanks

Re: Lucene nightly build: similarity score per field

Posted by Patrick Diviacco <pa...@gmail.com>.

Nevermind, I've finally solved.
I just now need to figure out how to retrieve the scores per fields in my
results.

I need to know how much similar each field is. I know I can use explain()
but it slows down computations...

thanks

On 4 March 2011 21:21, Patrick Diviacco <pa...@gmail.com> wrote:

> ok thanks, one last thing: in my TimeSimilarity class, I just need to use
> this formula:
>
> queryTimeValue - DocTimeValue / normalizationFactor
>
> to compute the similarity score of a time/date field.
> How do you suggest to implement this ? Which methods do I need to overwrite
> ?
>
> thanks
>
> On 4 March 2011 20:39, Robert Muir <rc...@gmail.com> wrote:
>
>> On Fri, Mar 4, 2011 at 2:12 PM, Patrick Diviacco
>> <pa...@gmail.com> wrote:
>> > hey Robert,
>> >
>> > I know there is the documentation, I'm sorry I've confused setSimilarity
>> > with setSimilarityProvider.
>> >
>> > However, my question was about "Similarity get(String field) method" (I
>> > cannot understand from documentation sorry).
>> >
>> > Should I create a customSimilarity class implementing the
>> SimilarityProvider
>> > and then implement the get method ?
>> >
>> > Also, inside the get method should I check the passed string field and
>> > return different custom similarities classes ?
>>
>> yes, the SimilarityProvider is a factory interface that returns a
>> Similarity for a specified field.
>>
>> So you have to implement this interface, and in your get(String field)
>> method return the appropriate Similarity for the field.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Re: Lucene nightly build: similarity score per field

Posted by Patrick Diviacco <pa...@gmail.com>.

ok thanks, one last thing: in my TimeSimilarity class, I just need to use
this formula:

queryTimeValue - DocTimeValue / normalizationFactor

to compute the similarity score of a time/date field.
How do you suggest to implement this ? Which methods do I need to overwrite
?

thanks

On 4 March 2011 20:39, Robert Muir <rc...@gmail.com> wrote:

> On Fri, Mar 4, 2011 at 2:12 PM, Patrick Diviacco
> <pa...@gmail.com> wrote:
> > hey Robert,
> >
> > I know there is the documentation, I'm sorry I've confused setSimilarity
> > with setSimilarityProvider.
> >
> > However, my question was about "Similarity get(String field) method" (I
> > cannot understand from documentation sorry).
> >
> > Should I create a customSimilarity class implementing the
> SimilarityProvider
> > and then implement the get method ?
> >
> > Also, inside the get method should I check the passed string field and
> > return different custom similarities classes ?
>
> yes, the SimilarityProvider is a factory interface that returns a
> Similarity for a specified field.
>
> So you have to implement this interface, and in your get(String field)
> method return the appropriate Similarity for the field.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene nightly build: similarity score per field

Posted by Robert Muir <rc...@gmail.com>.

On Fri, Mar 4, 2011 at 2:12 PM, Patrick Diviacco
<pa...@gmail.com> wrote:
> hey Robert,
>
> I know there is the documentation, I'm sorry I've confused setSimilarity
> with setSimilarityProvider.
>
> However, my question was about "Similarity get(String field) method" (I
> cannot understand from documentation sorry).
>
> Should I create a customSimilarity class implementing the SimilarityProvider
> and then implement the get method ?
>
> Also, inside the get method should I check the passed string field and
> return different custom similarities classes ?

yes, the SimilarityProvider is a factory interface that returns a
Similarity for a specified field.

So you have to implement this interface, and in your get(String field)
method return the appropriate Similarity for the field.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene nightly build: similarity score per field

Posted by Patrick Diviacco <pa...@gmail.com>.

hey Robert,

I know there is the documentation, I'm sorry I've confused setSimilarity
with setSimilarityProvider.

However, my question was about "Similarity get(String field) method" (I
cannot understand from documentation sorry).

Should I create a customSimilarity class implementing the SimilarityProvider
and then implement the get method ?

Also, inside the get method should I check the passed string field and
return different custom similarities classes ?

thanks
Patrick

On 4 March 2011 19:57, Robert Muir <rc...@gmail.com> wrote:

> On Fri, Mar 4, 2011 at 1:18 PM, Patrick Diviacco
> <pa...@gmail.com> wrote:
> > So far, I know I can customize the similarity class for the searcher:
> > searcher.setSimilarity(new BoostingSimilarity());
> >
>
> This is not correct.. have you read the javadocs?
>
> IndexSearcher doesn't have a setSimilarity() anymore, it has
> setSimilarityProvider().
> I recommend reading CHANGES.txt, MIGRATE.txt, and the javadocs, where
> this is all documented.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene nightly build: similarity score per field

Posted by Robert Muir <rc...@gmail.com>.

On Fri, Mar 4, 2011 at 1:18 PM, Patrick Diviacco
<pa...@gmail.com> wrote:
> So far, I know I can customize the similarity class for the searcher:
> searcher.setSimilarity(new BoostingSimilarity());
>

This is not correct.. have you read the javadocs?

IndexSearcher doesn't have a setSimilarity() anymore, it has
setSimilarityProvider().
I recommend reading CHANGES.txt, MIGRATE.txt, and the javadocs, where
this is all documented.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene nightly build: similarity score per field

Posted by Patrick Diviacco <pa...@gmail.com>.

All right.

So it is still not clear how to exactly implement it.

I have SimilarityA and SimilarityB subclasses.
So far, I know I can customize the similarity class for the searcher:
searcher.setSimilarity(new BoostingSimilarity());

When/how should I use get method ?
Similarity get(String field)

thanks





On 3 March 2011 16:34, Robert Muir <rc...@gmail.com> wrote:

> On Thu, Mar 3, 2011 at 10:25 AM, Patrick Diviacco
> <pa...@gmail.com> wrote:
> > I've downloaded Lucene nightly build because I need to customize the
> > similarity *per field*.
> >
> > However I don't see the field parameter passed to the methods to compute
> the
> > score such as "tf" and "idf"...
> >
> > how can I implement different similarities score per document field then
> ?
> >
>
> Hi, the way you set this up is to use SimilarityProvider to configure
> Similarities per-field: for example maybe field A, B, and C use
> Similarity1 and field D use Similarity2.
> So you just set your SimilarityProvider on the IndexWriter and
> IndexSearcher, and it must implement this factory method:
>
>  Similarity get(String field)
>
> Here are the reasons for this factory design (versus simply adding
> field to every method):
> 1. performance: up-front we ask the SimilarityProvider for the
> per-field Similarity. So you probably use a hashmap or something here
> to return the correct one. If you had to do this on every single call
> to tf(), this would slow down queries significantly.
> 2. flexibility: we are working to generalize Similarity, and maybe the
> existing stuff you see becomes TFIDFSimilarity. So in the future you
> might have field1 that uses TFIDF and field2 that uses something else
> (e.g. BM25), with a totally different API and scoring system.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene nightly build: similarity score per field

Posted by Robert Muir <rc...@gmail.com>.

On Thu, Mar 3, 2011 at 10:25 AM, Patrick Diviacco
<pa...@gmail.com> wrote:
> I've downloaded Lucene nightly build because I need to customize the
> similarity *per field*.
>
> However I don't see the field parameter passed to the methods to compute the
> score such as "tf" and "idf"...
>
> how can I implement different similarities score per document field then ?
>

Hi, the way you set this up is to use SimilarityProvider to configure
Similarities per-field: for example maybe field A, B, and C use
Similarity1 and field D use Similarity2.
So you just set your SimilarityProvider on the IndexWriter and
IndexSearcher, and it must implement this factory method:

  Similarity get(String field)

Here are the reasons for this factory design (versus simply adding
field to every method):
1. performance: up-front we ask the SimilarityProvider for the
per-field Similarity. So you probably use a hashmap or something here
to return the correct one. If you had to do this on every single call
to tf(), this would slow down queries significantly.
2. flexibility: we are working to generalize Similarity, and maybe the
existing stuff you see becomes TFIDFSimilarity. So in the future you
might have field1 that uses TFIDF and field2 that uses something else
(e.g. BM25), with a totally different API and scoring system.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org