You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Yonik Seeley <yo...@apache.org> on 2008/06/24 22:28:35 UTC

per-field similarity

Something to consider for Lucene 3 is to have something to retrieve
Similarity per-field rather than passing the field name into some
functions...

benefits:
- Would allow customizing most Similarity functions per-field
- Performance: Similarity for a field could be looked up once at the
beginning of a query and reused, eliminating hash lookups for every
Similarity function called that needs to be different depending on the
field name.

Might also consider passing in more optional context when retrieving
the similarity for a field (such as a Query, if searching).
Something like Similarity.getSimilarity(String field, Query q).
Multi-field queries (boolean query) could pass null for the field.
Perhaps it could even be back compatible...

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Chris Hostetter <ho...@fucit.org>.
: That would require a user to subclass both IndexWriter and Searcher.

I was assuming there would be corrisponding setters and that Searcher and 
IndexWriter would maintain Maps, but that wouldn't really help in cases of 
dynamicly named fields ... you would certinaly want some sort of Factory 
(or "SimilaritySelector" like FieldSelector) so you can plugin custom 
behavior.

: Since Similarity is already passed around, adding a factory method
: their seems like the easiest approach.  It's also a class, so we could
: easily add a method.

Searcher and IndexWriter are classes too, so it's just as easy to add 
setters and getters for a new SimilarityFactory on each of them as it is 
to retrofit Similarity into being a factory for itself ... and it would 
seem less confusing.

Compatibility could even be maintained by making the current setSimilarity 
method create a SimpleSimilarityFactory that allways returns the same 
Similarity.  The getSimilarity() methods on both classes could delegate to 
the factory, newer more complex Queries can use 
searcher.getSimilarityFactory().getSimilarity(complexCriteria)

: An optional Query param or other context (or more than one factory
: method) was just a quick idea... may or may not ultimately make sense.

More then one method on the factory would be great ... but all the more 
reason to create a ne class instead of adding a lot of new factory methods 
on teh existing Similarity API.

: It might be a little cleaner to pass around a SimilarityFactory, but
: that ship has sailed IMO (along with many others :-)

It's not too late ... it seems really straightforward to me (but talk is 
cheap -- i haven't actually sat down and thought of potential uses cases 
and how well they would work with this type of approach).

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Yonik Seeley <yo...@apache.org>.
On Wed, Jun 25, 2008 at 5:06 PM, Chris Hostetter
<ho...@fucit.org> wrote:
> Hmmm... that seems like it would be confusing: particularly since in the
> IndexWriter case the "Query" param would never make sense.  changing
> IndexWriter.getSimilarity to take a "String fieldName" and changing
> Searcher.getSimilarity to take "String fieldName, Query q" seem like they
> would be more straight forward.

That would require a user to subclass both IndexWriter and Searcher.
Since Similarity is already passed around, adding a factory method
their seems like the easiest approach.  It's also a class, so we could
easily add a method.

An optional Query param or other context (or more than one factory
method) was just a quick idea... may or may not ultimately make sense.

> (There's also the potential ambiguity of "how many times do i call
> Similarity.getSimilarity() before i stop?" ... it may seem silly, but if
> you're working in a Query or Scorer or Weight you may not be sure if it's
> been done yet)

Once per level?  When creating the Weight I would think.  If you call
again, the default impl would return "this".

It might be a little cleaner to pass around a SimilarityFactory, but
that ship has sailed IMO (along with many others :-)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Chris Hostetter <ho...@fucit.org>.
: > i assume you mean "Searcher.getSimilarity(String fieldName, Query q)" to
: > replace the current Searcher.getSimilarity() right?
: 
: No, I meant Similarity (it's more like a factory method on the
: Similarity class).
: The Searcher.getSimilarity() could remain unchanged.
: A Similarity is what is passed into the IndexWriter, and you would
: want the same per-field flexibility there.

Hmmm... that seems like it would be confusing: particularly since in the 
IndexWriter case the "Query" param would never make sense.  changing 
IndexWriter.getSimilarity to take a "String fieldName" and changing 
Searcher.getSimilarity to take "String fieldName, Query q" seem like they 
would be more straight forward.

(There's also the potential ambiguity of "how many times do i call 
Similarity.getSimilarity() before i stop?" ... it may seem silly, but if 
you're working in a Query or Scorer or Weight you may not be sure if it's 
been done yet)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Yonik Seeley <yo...@apache.org>.
On Wed, Jun 25, 2008 at 2:19 PM, Chris Hostetter
<ho...@fucit.org> wrote:
> : Might also consider passing in more optional context when retrieving
> : the similarity for a field (such as a Query, if searching).
> : Something like Similarity.getSimilarity(String field, Query q).
>
> i assume you mean "Searcher.getSimilarity(String fieldName, Query q)" to
> replace the current Searcher.getSimilarity() right?

No, I meant Similarity (it's more like a factory method on the
Similarity class).
The Searcher.getSimilarity() could remain unchanged.
A Similarity is what is passed into the IndexWriter, and you would
want the same per-field flexibility there.

>  (where in both cases
> we are talking about an instance method and not a static method)

Right.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Chris Hostetter <ho...@fucit.org>.
: Might also consider passing in more optional context when retrieving
: the similarity for a field (such as a Query, if searching).
: Something like Similarity.getSimilarity(String field, Query q).

i assume you mean "Searcher.getSimilarity(String fieldName, Query q)" to 
replace the current Searcher.getSimilarity() right?  (where in both cases 
we are talking about an instance method and not a static method)

There's been some discussions about this in the past, I think at one point 
Doug suggested almost the exact same thing in this thread... 

http://www.nabble.com/-jira--Created%3A-%28LUCENE-577%29-SweetSpotSimiliarity-to4533741.html#a4536312

...it could probably be done in a completley backwards compatible way.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Karl Wettin <ka...@gmail.com>.
+1

24 jun 2008 kl. 22.28 skrev Yonik Seeley:

> Something to consider for Lucene 3 is to have something to retrieve
> Similarity per-field rather than passing the field name into some
> functions...
>
> benefits:
> - Would allow customizing most Similarity functions per-field
> - Performance: Similarity for a field could be looked up once at the
> beginning of a query and reused, eliminating hash lookups for every
> Similarity function called that needs to be different depending on the
> field name.
>
> Might also consider passing in more optional context when retrieving
> the similarity for a field (such as a Query, if searching).
> Something like Similarity.getSimilarity(String field, Query q).
> Multi-field queries (boolean query) could pass null for the field.
> Perhaps it could even be back compatible...
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: per-field similarity

Posted by Mike Klaas <mi...@gmail.com>.
On 24-Jun-08, at 1:28 PM, Yonik Seeley wrote:

> Something to consider for Lucene 3 is to have something to retrieve
> Similarity per-field rather than passing the field name into some
> functions...

+1

I've felt that this was the "proper" (and more useful) way to do  
things for a long time

(http://markmail.org/message/56bk6wrbwallyjvr)

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org