You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2006/02/14 23:04:53 UTC

updating fieldNorms in mass

I just noticed the IndexReader.setNorm method(s) today and was extremely
stoked -- after rebuilding my dev index from scratch three times last week
becuase I wanted to try out tweaks to Similarity.lengthNorm the idea of
being able to directly change the norms without rebuildign from scratch is
looking *really* good.

in the case where doc boosts and field boosts aren't used, it seems like
it would be very easy to write a maintenance app that did something
like...

   get instance of similarity based on input
   foreach fieldName in input {
       int[] termCounts = new int[maxDoc];
       foreach Term in TermEnum for field {
          foreach TermDoc on that Term {
              termCounts[td.doc()]+=td.freq()
          }
       }
       foreach doc {
          IndexReader.setNorm(doc, fieldName, similarity.encodeNorm
                  (similarity.lengthNorm(fieldName, termCounts[doc]))
       }
   }


...does anyone see anything wrong with the overall appraoch?

has anyone implimented this already that they'd like to share?  (or any
gotchas they ran into i should be wary of?)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: updating fieldNorms in mass

Posted by Chris Hostetter <ho...@fucit.org>.

: > in the case where doc boosts and field boosts aren't used, it seems like
: > it would be very easy to write a maintenance app that did something
: > like...

: > ...does anyone see anything wrong with the overall appraoch?
:
: Looks good to me.

Implimented and submitted in LUCENE-496.  So far it seems like it
works great (accomplishing in 80 seconds what used to take me about at
least an hour of reindexing) but if anyone spots anything that looks
hinky, give a shout...

   https://issues.apache.org/jira/browse/LUCENE-496

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: updating fieldNorms in mass

Posted by Doug Cutting <cu...@apache.org>.

Chris Hostetter wrote:
> in the case where doc boosts and field boosts aren't used, it seems like
> it would be very easy to write a maintenance app that did something
> like...
> 
>    get instance of similarity based on input
>    foreach fieldName in input {
>        int[] termCounts = new int[maxDoc];
>        foreach Term in TermEnum for field {
>           foreach TermDoc on that Term {
>               termCounts[td.doc()]+=td.freq()
>           }
>        }
>        foreach doc {
>           IndexReader.setNorm(doc, fieldName, similarity.encodeNorm
>                   (similarity.lengthNorm(fieldName, termCounts[doc]))
>        }
>    }
> 
> 
> ...does anyone see anything wrong with the overall appraoch?

Looks good to me.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org