You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2005/10/08 17:58:47 UTC

[jira] Created: (LUCENE-448) optional norms

optional norms
--------------

         Key: LUCENE-448
         URL: http://issues.apache.org/jira/browse/LUCENE-448
     Project: Lucene - Java
        Type: New Feature
  Components: Index  
    Versions: CVS Nightly - Specify date in submission    
    Reporter: Yonik Seeley
 Attachments: omitNorms.txt

For applications with many indexed fields, the norms cause memory problems both during indexing and querying.
This patch makes norms optional on a per-field basis, in the same way that term vectors are optional per-field.

Overview of changes:
 - Field.omitNorms that defaults to false
 - backward compatible lucene file format change: FieldInfos.FieldBits has a bit for omitNorms
 - IndexReader.hasNorms() method
 - During merging, if any segment includes norms, then norms are included.
 - methods to get norms return the equivalent 1.0f array for backward compatibility

The patch was designed for backward compatibility:
 - all current unit tests pass w/o any modifications required
 - compatible with old indexes since the default is omitNorms=false
 - compatible with older/custom subclasses of IndexReader since a default hasNorms() is provided
 - compatible with older/custom users of IndexReader such as Weight/Scorer/explain since a norm array is produced on demand, even if norms were not stored

If this patch is accepted (or if the direction is acceptable), performance for scoring  could be improved by assuming 1.0f when hasNorms(field)==false.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-448) optional norms

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LUCENE-448?page=comments#action_12356100 ] 

Doug Cutting commented on LUCENE-448:
-------------------------------------

Un-tokenized fields don't need a lengthNorm, but they can be boosted.  So it should be well documented that disabling norms disables boosting.

I'd hide fakeNorms().  If user code shouldn't call it, then it shouldn't appear in the javadoc.  You could make it package-private.  Or, can you not make MultiReader.norms() rely on SegmentReader.norms() to create fake norms as needed?

As for naming setter/getters: I don't feel strongly about this.  I sometimes use get/set, even when I might prefer omitting them, simply because it is the fashion and the style police hassle me when I don't.

> optional norms
> --------------
>
>          Key: LUCENE-448
>          URL: http://issues.apache.org/jira/browse/LUCENE-448
>      Project: Lucene - Java
>         Type: New Feature
>   Components: Index
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>  Attachments: omitNorms.txt
>
> For applications with many indexed fields, the norms cause memory problems both during indexing and querying.
> This patch makes norms optional on a per-field basis, in the same way that term vectors are optional per-field.
> Overview of changes:
>  - Field.omitNorms that defaults to false
>  - backward compatible lucene file format change: FieldInfos.FieldBits has a bit for omitNorms
>  - IndexReader.hasNorms() method
>  - During merging, if any segment includes norms, then norms are included.
>  - methods to get norms return the equivalent 1.0f array for backward compatibility
> The patch was designed for backward compatibility:
>  - all current unit tests pass w/o any modifications required
>  - compatible with old indexes since the default is omitNorms=false
>  - compatible with older/custom subclasses of IndexReader since a default hasNorms() is provided
>  - compatible with older/custom users of IndexReader such as Weight/Scorer/explain since a norm array is produced on demand, even if norms were not stored
> If this patch is accepted (or if the direction is acceptable), performance for scoring  could be improved by assuming 1.0f when hasNorms(field)==false.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-448) optional norms

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LUCENE-448?page=comments#action_12356034 ] 

Yonik Seeley commented on LUCENE-448:
-------------------------------------

> It might be nice to add something like a Field.Index.NO_NORMS, that assumes un-tokenized...

Good idea... un-tokenized fields don't need a lengthNorm anyway.

Minor Q: Should fakeNorms() exist on IndexReader (as is now), or simply be private to both SegmentReader and MultiReader (the only two that need to generate fake norm arrays)?

Very minor Q: Should the getter/setter currently named isOmitNorms()/setOmitNorms() be renamed... I followed the example of isStoreOffsetWithTermVector(), but omitNorms()/omitNorms(boolean)  reads nicer in code.


> optional norms
> --------------
>
>          Key: LUCENE-448
>          URL: http://issues.apache.org/jira/browse/LUCENE-448
>      Project: Lucene - Java
>         Type: New Feature
>   Components: Index
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>  Attachments: omitNorms.txt
>
> For applications with many indexed fields, the norms cause memory problems both during indexing and querying.
> This patch makes norms optional on a per-field basis, in the same way that term vectors are optional per-field.
> Overview of changes:
>  - Field.omitNorms that defaults to false
>  - backward compatible lucene file format change: FieldInfos.FieldBits has a bit for omitNorms
>  - IndexReader.hasNorms() method
>  - During merging, if any segment includes norms, then norms are included.
>  - methods to get norms return the equivalent 1.0f array for backward compatibility
> The patch was designed for backward compatibility:
>  - all current unit tests pass w/o any modifications required
>  - compatible with old indexes since the default is omitNorms=false
>  - compatible with older/custom subclasses of IndexReader since a default hasNorms() is provided
>  - compatible with older/custom users of IndexReader such as Weight/Scorer/explain since a norm array is produced on demand, even if norms were not stored
> If this patch is accepted (or if the direction is acceptable), performance for scoring  could be improved by assuming 1.0f when hasNorms(field)==false.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-448) optional norms

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-448?page=all ]
     
Yonik Seeley resolved LUCENE-448:
---------------------------------

    Fix Version: CVS Nightly - Specify date in submission
     Resolution: Fixed
      Assign To: Yonik Seeley

> optional norms
> --------------
>
>          Key: LUCENE-448
>          URL: http://issues.apache.org/jira/browse/LUCENE-448
>      Project: Lucene - Java
>         Type: New Feature
>   Components: Index
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>     Assignee: Yonik Seeley
>      Fix For: CVS Nightly - Specify date in submission
>  Attachments: omitNorms.txt
>
> For applications with many indexed fields, the norms cause memory problems both during indexing and querying.
> This patch makes norms optional on a per-field basis, in the same way that term vectors are optional per-field.
> Overview of changes:
>  - Field.omitNorms that defaults to false
>  - backward compatible lucene file format change: FieldInfos.FieldBits has a bit for omitNorms
>  - IndexReader.hasNorms() method
>  - During merging, if any segment includes norms, then norms are included.
>  - methods to get norms return the equivalent 1.0f array for backward compatibility
> The patch was designed for backward compatibility:
>  - all current unit tests pass w/o any modifications required
>  - compatible with old indexes since the default is omitNorms=false
>  - compatible with older/custom subclasses of IndexReader since a default hasNorms() is provided
>  - compatible with older/custom users of IndexReader such as Weight/Scorer/explain since a norm array is produced on demand, even if norms were not stored
> If this patch is accepted (or if the direction is acceptable), performance for scoring  could be improved by assuming 1.0f when hasNorms(field)==false.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-448) optional norms

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LUCENE-448?page=comments#action_12356017 ] 

Doug Cutting commented on LUCENE-448:
-------------------------------------

+1

This can greatly reduce the amount of memory used by indexes with lots of fields.

It might be nice to add something like a Field.Index.NO_NORMS, that assumes un-tokenized...

> optional norms
> --------------
>
>          Key: LUCENE-448
>          URL: http://issues.apache.org/jira/browse/LUCENE-448
>      Project: Lucene - Java
>         Type: New Feature
>   Components: Index
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>  Attachments: omitNorms.txt
>
> For applications with many indexed fields, the norms cause memory problems both during indexing and querying.
> This patch makes norms optional on a per-field basis, in the same way that term vectors are optional per-field.
> Overview of changes:
>  - Field.omitNorms that defaults to false
>  - backward compatible lucene file format change: FieldInfos.FieldBits has a bit for omitNorms
>  - IndexReader.hasNorms() method
>  - During merging, if any segment includes norms, then norms are included.
>  - methods to get norms return the equivalent 1.0f array for backward compatibility
> The patch was designed for backward compatibility:
>  - all current unit tests pass w/o any modifications required
>  - compatible with old indexes since the default is omitNorms=false
>  - compatible with older/custom subclasses of IndexReader since a default hasNorms() is provided
>  - compatible with older/custom users of IndexReader such as Weight/Scorer/explain since a norm array is produced on demand, even if norms were not stored
> If this patch is accepted (or if the direction is acceptable), performance for scoring  could be improved by assuming 1.0f when hasNorms(field)==false.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-448) optional norms

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-448?page=all ]

Yonik Seeley updated LUCENE-448:
--------------------------------

    Attachment: omitNorms.txt

> optional norms
> --------------
>
>          Key: LUCENE-448
>          URL: http://issues.apache.org/jira/browse/LUCENE-448
>      Project: Lucene - Java
>         Type: New Feature
>   Components: Index
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>  Attachments: omitNorms.txt
>
> For applications with many indexed fields, the norms cause memory problems both during indexing and querying.
> This patch makes norms optional on a per-field basis, in the same way that term vectors are optional per-field.
> Overview of changes:
>  - Field.omitNorms that defaults to false
>  - backward compatible lucene file format change: FieldInfos.FieldBits has a bit for omitNorms
>  - IndexReader.hasNorms() method
>  - During merging, if any segment includes norms, then norms are included.
>  - methods to get norms return the equivalent 1.0f array for backward compatibility
> The patch was designed for backward compatibility:
>  - all current unit tests pass w/o any modifications required
>  - compatible with old indexes since the default is omitNorms=false
>  - compatible with older/custom subclasses of IndexReader since a default hasNorms() is provided
>  - compatible with older/custom users of IndexReader such as Weight/Scorer/explain since a norm array is produced on demand, even if norms were not stored
> If this patch is accepted (or if the direction is acceptable), performance for scoring  could be improved by assuming 1.0f when hasNorms(field)==false.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org