You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tsvika Rabkin <ts...@gmail.com> on 2011/02/01 14:27:14 UTC

Using different field when overriding computeNorm

Hi,

I would like to override default similarity's computeNorm to work with
a different field, other than the query field.

Here is the DefaultSimilarity implementation:

@Override
  public float computeNorm(String field, FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
  }

any ideas how to do that?

Thanks,

Tsvika

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using different field when overriding computeNorm

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Feb 3, 2011 at 3:27 PM, Ryan Aylward <ry...@glassdoor.com> wrote:
> This is great. Is there a target of when 4.0 will be released?
>

Unfortunately I think its quite a ways away: there are branches for
major features such as per-document payloads, realtime search, modern
index compression algorithms, and a variety of other exciting things
in the works. As far as releases go, currently we are working towards
release 3.1, which is the next stable minor release upgrade from 3.0.

It might be technically possible to backport this feature (per-field
similarity) to the 3.x codebase while still keeping backwards
compatibility, but I'm worried about breaking backwards compatibility
in subtle ways due to some gremlins in the code... we fixed most of
these gremlins in trunk but they are still available and deprecated in
3.1 (example: https://issues.apache.org/jira/browse/LUCENE-2828).

So, at the moment having this feature be something that has to wait
until 4.0 is the safest option in my opinion... but I feel your pain
here when trying to customize the scoring system...

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Using different field when overriding computeNorm

Posted by Ryan Aylward <ry...@glassdoor.com>.
This is great. Is there a target of when 4.0 will be released?

-----Original Message-----
From: Robert Muir [mailto:rcmuir@gmail.com] 
Sent: Tuesday, February 01, 2011 11:10 AM
To: java-user@lucene.apache.org
Subject: Re: Using different field when overriding computeNorm

On Tue, Feb 1, 2011 at 1:51 PM, Ryan Aylward <ry...@glassdoor.com> wrote:
> I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up

in Lucene's trunk, Similarity can now be controlled on a per-field
basis, see https://issues.apache.org/jira/browse/LUCENE-2236

The only exceptions are things like coord() which apply to e.g.
BooleanQuery (which might span multiple fields) and remain top-level
in the new SimilarityProvider.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using different field when overriding computeNorm

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Feb 1, 2011 at 1:51 PM, Ryan Aylward <ry...@glassdoor.com> wrote:
> I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up

in Lucene's trunk, Similarity can now be controlled on a per-field
basis, see https://issues.apache.org/jira/browse/LUCENE-2236

The only exceptions are things like coord() which apply to e.g.
BooleanQuery (which might span multiple fields) and remain top-level
in the new SimilarityProvider.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Using different field when overriding computeNorm

Posted by Ryan Aylward <ry...@glassdoor.com>.
I have had to do similar things to other methods of Similarity. In my example, I wanted to have different behavior for the tf() method for each field. The tf method does not include a field parameter as an input to it. The only solution I could come up with was to add a thread local to set the field and then check the thread local within the tf function. Here's the tf function...

	public float tf(float freq) {

		// Get the value of the thread local...
		String field = FieldThreadLocal.getField();

		if ("fieldA".equals(field)) {
			// always return 1 for field A
			return 1;
		} else {
			// otherwise, use the normal tf function
			return super.tf(freq);
		}
	}

tf() is used during scoring so I had to override the TermQuery (and TermWeight and TermScorer) to be able to set and clear the thread local at the appropriate times. This is a pretty ugly hack, but I couldn't find another way to make this work.

computeNorm() is calculated at index creation time but you try to do something similar.

Would be curious if other people had a better suggestion as to how to do this.

-----Original Message-----
From: Tsvika Rabkin [mailto:tsvika.rabkin@gmail.com] 
Sent: Tuesday, February 01, 2011 5:27 AM
To: java-user@lucene.apache.org
Subject: Using different field when overriding computeNorm

Hi,

I would like to override default similarity's computeNorm to work with
a different field, other than the query field.

Here is the DefaultSimilarity implementation:

@Override
  public float computeNorm(String field, FieldInvertState state) {
    final int numTerms;
    if (discountOverlaps)
      numTerms = state.getLength() - state.getNumOverlap();
    else
      numTerms = state.getLength();
    return state.getBoost() * ((float) (1.0 / Math.sqrt(numTerms)));
  }

any ideas how to do that?

Thanks,

Tsvika

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org