You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by kai loofi <ka...@gmail.com> on 2019/05/31 23:28:40 UTC

[Issue] omitNorms ignored in DefaultIndexingChain.getOrAddField method

Hello,

I am a Lucene user and have been trying to write a search platform on top
of lucene using version 6.6.1. I ran into some weird behavior and wanted to
seek opinions from the community. I noticed that norms were being created
even when I set *omitNorms=true* in the fieldTypes. I chased the issue and
found that the method *getOrAddField* tries to create a *FieldInfo *object
in the 1st pass. By default this object has omitNorms to false. The method
sets the *indexOptions *as specified in the fieldType on this newly created
object but doesn't do the same for *omitNorms.* This effectively overrides
this flag which creates issues down the line.

Here's the code snippet for the method with the *fieldInfos.getOrAdd* call

private PerField getOrAddField(String name, IndexableFieldType
fieldType, boolean invert) {

  // Make sure we have a PerField allocated
  final int hashPos = name.hashCode() & hashMask;
  PerField fp = fieldHash[hashPos];
  while (fp != null && !fp.fieldInfo.name.equals(name)) {
    fp = fp.next;
  }

  if (fp == null) {
    // First time we are seeing this field in this segment

    *FieldInfo fi = fieldInfos.getOrAdd(name);*
    // Messy: must set this here because e.g.
FreqProxTermsWriterPerField looks at the initial
    // IndexOptions to decide what arrays it must create).  Then, we
also must set it in
    // PerField.invert to allow for later downgrading of the index options:
    *fi.setIndexOptions(fieldType.indexOptions()*);

    fp = new PerField(fi, invert);

    ...

 The *getOrAdd *method below instantiates a new object with omitNorms
set to false as the 4th parameter.

/** Create a new field, or return existing one. */
public FieldInfo getOrAdd(String name) {
  FieldInfo fi = fieldInfo(name);
  if (fi == null) {
    // This field wasn't yet added to this in-RAM
    // segment's FieldInfo, so now we get a global
    // number for this field.  If the field was seen
    // before then we'll get the same name and number,
    // else we'll allocate a new one:
    final int fieldNumber = globalFieldNumbers.addOrGet(name, -1,
DocValuesType.NONE, 0, 0);
    fi = new FieldInfo(name, fieldNumber, false, false, false,
IndexOptions.NONE, DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
    assert !byName.containsKey(fi.name);
    globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number),
fi.name, DocValuesType.NONE);
    byName.put(fi.name, fi);
  }

  return fi;
}

I was thinking of opening this as a bug on lucene but would like get some
feedback and make sure if I am not missing anything. Thanks in advance.

Regards,
Ishan

Re: [Issue] omitNorms ignored in DefaultIndexingChain.getOrAddField method

Posted by Adrien Grand <jp...@gmail.com>.
The field is indeed initially created with omitNorms=false, but this
gets fixed later in PerField#invert. Additionally I can't reproduce
the bug you are describing: if I index documents with norms disabled,
then I don't have norms in my index.

On Sat, Jun 1, 2019 at 4:55 AM kai loofi <ka...@gmail.com> wrote:
>
> Hello,
>
> I am a Lucene user and have been trying to write a search platform on top of lucene using version 6.6.1. I ran into some weird behavior and wanted to seek opinions from the community. I noticed that norms were being created even when I set omitNorms=true in the fieldTypes. I chased the issue and found that the method getOrAddField tries to create a FieldInfo object in the 1st pass. By default this object has omitNorms to false. The method sets the indexOptions as specified in the fieldType on this newly created object but doesn't do the same for omitNorms. This effectively overrides this flag which creates issues down the line.
>
> Here's the code snippet for the method with the fieldInfos.getOrAdd call
>
> private PerField getOrAddField(String name, IndexableFieldType fieldType, boolean invert) {
>
>   // Make sure we have a PerField allocated
>   final int hashPos = name.hashCode() & hashMask;
>   PerField fp = fieldHash[hashPos];
>   while (fp != null && !fp.fieldInfo.name.equals(name)) {
>     fp = fp.next;
>   }
>
>   if (fp == null) {
>     // First time we are seeing this field in this segment
>
>     FieldInfo fi = fieldInfos.getOrAdd(name);
>     // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at the initial
>     // IndexOptions to decide what arrays it must create).  Then, we also must set it in
>     // PerField.invert to allow for later downgrading of the index options:
>     fi.setIndexOptions(fieldType.indexOptions());
>
>     fp = new PerField(fi, invert);
>
>     ...
>
>
>
> The getOrAdd method below instantiates a new object with omitNorms set to false as the 4th parameter.
>
> /** Create a new field, or return existing one. */
> public FieldInfo getOrAdd(String name) {
>   FieldInfo fi = fieldInfo(name);
>   if (fi == null) {
>     // This field wasn't yet added to this in-RAM
>     // segment's FieldInfo, so now we get a global
>     // number for this field.  If the field was seen
>     // before then we'll get the same name and number,
>     // else we'll allocate a new one:
>     final int fieldNumber = globalFieldNumbers.addOrGet(name, -1, DocValuesType.NONE, 0, 0);
>     fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE, DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
>     assert !byName.containsKey(fi.name);
>     globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name, DocValuesType.NONE);
>     byName.put(fi.name, fi);
>   }
>
>   return fi;
> }
>
> I was thinking of opening this as a bug on lucene but would like get some feedback and make sure if I am not missing anything. Thanks in advance.
>
> Regards,
> Ishan
>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [Issue] omitNorms ignored in DefaultIndexingChain.getOrAddField method

Posted by Adrien Grand <jp...@gmail.com>.
The field is indeed initially created with omitNorms=false, but this
gets fixed later in PerField#invert. Additionally I can't reproduce
the bug you are describing: if I index documents with norms disabled,
then I don't have norms in my index.

On Sat, Jun 1, 2019 at 4:55 AM kai loofi <ka...@gmail.com> wrote:
>
> Hello,
>
> I am a Lucene user and have been trying to write a search platform on top of lucene using version 6.6.1. I ran into some weird behavior and wanted to seek opinions from the community. I noticed that norms were being created even when I set omitNorms=true in the fieldTypes. I chased the issue and found that the method getOrAddField tries to create a FieldInfo object in the 1st pass. By default this object has omitNorms to false. The method sets the indexOptions as specified in the fieldType on this newly created object but doesn't do the same for omitNorms. This effectively overrides this flag which creates issues down the line.
>
> Here's the code snippet for the method with the fieldInfos.getOrAdd call
>
> private PerField getOrAddField(String name, IndexableFieldType fieldType, boolean invert) {
>
>   // Make sure we have a PerField allocated
>   final int hashPos = name.hashCode() & hashMask;
>   PerField fp = fieldHash[hashPos];
>   while (fp != null && !fp.fieldInfo.name.equals(name)) {
>     fp = fp.next;
>   }
>
>   if (fp == null) {
>     // First time we are seeing this field in this segment
>
>     FieldInfo fi = fieldInfos.getOrAdd(name);
>     // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at the initial
>     // IndexOptions to decide what arrays it must create).  Then, we also must set it in
>     // PerField.invert to allow for later downgrading of the index options:
>     fi.setIndexOptions(fieldType.indexOptions());
>
>     fp = new PerField(fi, invert);
>
>     ...
>
>
>
> The getOrAdd method below instantiates a new object with omitNorms set to false as the 4th parameter.
>
> /** Create a new field, or return existing one. */
> public FieldInfo getOrAdd(String name) {
>   FieldInfo fi = fieldInfo(name);
>   if (fi == null) {
>     // This field wasn't yet added to this in-RAM
>     // segment's FieldInfo, so now we get a global
>     // number for this field.  If the field was seen
>     // before then we'll get the same name and number,
>     // else we'll allocate a new one:
>     final int fieldNumber = globalFieldNumbers.addOrGet(name, -1, DocValuesType.NONE, 0, 0);
>     fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE, DocValuesType.NONE, -1, new HashMap<>(), 0, 0);
>     assert !byName.containsKey(fi.name);
>     globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name, DocValuesType.NONE);
>     byName.put(fi.name, fi);
>   }
>
>   return fi;
> }
>
> I was thinking of opening this as a bug on lucene but would like get some feedback and make sure if I am not missing anything. Thanks in advance.
>
> Regards,
> Ishan
>


-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org