You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jason Rutherglen (JIRA)" <ji...@apache.org> on 2008/06/22 02:27:44 UTC

[jira] Created: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

InstantiatedIndexReader does not implement getFieldNames properly
-----------------------------------------------------------------

                 Key: LUCENE-1312
                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
             Project: Lucene - Java
          Issue Type: Bug
          Components: contrib/*
            Reporter: Jason Rutherglen


Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin updated LUCENE-1312:
--------------------------------

    Attachment: lucene-1312.patch

Thanks for the updated patch Jason!

I worked a bit on it:

* Factored out the writer specific field settings (omitnorms, binary, etc)  and introduced an extention in the writer.
* Added test cases for deleting documents and comparing field options in TestIndiciesEquals.
* Fixed bug in index writer that did not create field settings for non indexed non stored fields, identified by the new test in previous point.
* Introduced new class FieldNames that actually merges the field options. The previous patch just copied the most recent field setting from the writer, thus changing a setting  from true to false in some cases.

I think that was all. I'll be committing this soon pending any comments.

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1312:
-------------------------------------

    Attachment: lucene-1312.patch

lucene-1312.patch

A few additional updates related to deleted docs in InstantiatedIndexReader

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>         Attachments: lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607833#action_12607833 ] 

Jason Rutherglen commented on LUCENE-1312:
------------------------------------------

In order to simulate a different IndexReader per update using InstantiatedIndexReader  I wrote the following code.  There must be some flaws in it as it keeps on causing errors in SegmentMerger.  I am overriding deleted docs and max doc.  Also the latest error, which I am sure is probably somehow fixable.

{code}
public class OceanInstantiatedIndexReader extends InstantiatedIndexReader {
  private int maxDoc;
  private Set<Integer> deletedDocs;
  
  public OceanInstantiatedIndexReader(int maxDoc, InstantiatedIndex index, Set<Integer> deletedDocs) {
    super(index);
    this.maxDoc = maxDoc;
    this.deletedDocs = deletedDocs;
  }
  
  public int maxDoc() {
    return maxDoc;
  }
  
  public int numDocs() {
    return maxDoc() - deletedDocs.size();
  }
  
  public boolean isDeleted(int n) {
    if (n >= maxDoc) return true;
    if (deletedDocs != null && deletedDocs.contains(n)) return true;
    return false;
  }
  
  public boolean hasDeletions() {
    return true;
  }
}
{code}

{noformat}
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.apache.lucene.store.instantiated.InstantiatedIndexReader.norms(InstantiatedIndexReader.java:276)
at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:693)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:136)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:111)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3045)
{noformat}

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607838#action_12607838 ] 

Jason Rutherglen commented on LUCENE-1312:
------------------------------------------

Because the patch looks like a mess with the various changes I copied and pasted the alterations to InstantiatedIndexReader to allow the following code which works.  Basically saving a copy of normsByFieldNameAndDocumentNumber into the OceanInstantiatedIndexReader fixes the problem.  

{code}
protected Map<String,List<NormUpdate>> updatedNormsByFieldNameAndDocumentNumber = null;

  public static class NormUpdate {
    public int doc;
    public byte value;

    public NormUpdate(int doc, byte value) {
      this.doc = doc;
      this.value = value;
    }
  }
{code}

{code}
public class OceanInstantiatedIndexReader extends InstantiatedIndexReader {
  private int maxDoc;
  private Set<Integer> deletedDocs;
  private Map<String,byte[]> normsByFieldNameAndDocumentNumber;

  public OceanInstantiatedIndexReader(int maxDoc, InstantiatedIndex index, Set<Integer> deletedDocs) {
    super(index);
    this.maxDoc = maxDoc;
    this.deletedDocs = deletedDocs;
    normsByFieldNameAndDocumentNumber = new HashMap<String,byte[]>(index.getNormsByFieldNameAndDocumentNumber());
  }

  public int maxDoc() {
    return maxDoc;
  }

  protected void doSetNorm(int doc, String field, byte value) throws IOException {
    if (updatedNormsByFieldNameAndDocumentNumber == null) {
      updatedNormsByFieldNameAndDocumentNumber = new HashMap<String,List<NormUpdate>>(normsByFieldNameAndDocumentNumber.size());
    }
    List<NormUpdate> list = updatedNormsByFieldNameAndDocumentNumber.get(field);
    if (list == null) {
      list = new LinkedList<NormUpdate>();
      updatedNormsByFieldNameAndDocumentNumber.put(field, list);
    }
    list.add(new NormUpdate(doc, value));
  }

  public byte[] norms(String field) throws IOException {
    byte[] norms = normsByFieldNameAndDocumentNumber.get(field);
    if (updatedNormsByFieldNameAndDocumentNumber != null) {
      norms = norms.clone();
      List<NormUpdate> updated = updatedNormsByFieldNameAndDocumentNumber.get(field);
      if (updated != null) {
        for (NormUpdate normUpdate : updated) {
          norms[normUpdate.doc] = normUpdate.value;
        }
      }
    }
    return norms;
  }

  public void norms(String field, byte[] bytes, int offset) throws IOException {
    byte[] norms = normsByFieldNameAndDocumentNumber.get(field);
    System.arraycopy(norms, offset, bytes, 0, norms.length);
  }

  public int numDocs() {
    return maxDoc() - deletedDocs.size();
  }

  public boolean isDeleted(int n) {
    if (n >= maxDoc)
      return true;
    if (deletedDocs != null && deletedDocs.contains(n))
      return true;
    return false;
  }

  public boolean hasDeletions() {
    return true;
  }
}
{code}

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607445#action_12607445 ] 

Jason Rutherglen commented on LUCENE-1312:
------------------------------------------

Will be interesting to try out your new patch.  The previous patch (may not be related to InstantiatedIndex though) is still yielding some errors in SegmentMerger.  It would be good to have a test using IndexWriter.addIndexes(IndexReader[] readers) simply passing in one InstantiatedIndexReader.  

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1312:
-------------------------------------

    Attachment: lucene-1312.patch

lucene-1312.patch

Added the code to InstantiatedIndex(IndexReader sourceIndexReader, Set<String> fields) to create the fieldsetting map.  Separated FieldSetting into it's own class.  TestIndicesEquals worked.

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607683#action_12607683 ] 

Karl Wettin commented on LUCENE-1312:
-------------------------------------

bq. Will be interesting to try out your new patch.

It's right there in the issue, as the top patch. ; )

bq. The previous patch (may not be related to InstantiatedIndex though) is still yielding some errors in SegmentMerger. It would be good to have a test using IndexWriter.addIndexes(IndexReader[] readers) simply passing in one InstantiatedIndexReader.

Feel free to add such a test to the patch. If not I'll look in to it next time I sit down with it.



> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607813#action_12607813 ] 

Karl Wettin commented on LUCENE-1312:
-------------------------------------

bq. The error I was seeing in SegmenMerger was related to how InstantiatedIndex keeps Document in memory, so any editting of the Document after the Document is returned by InstantiatedIndexReader later messes up SegmentMerger.

Do you still have the exception? Can you paste it here?

It's a bit dangerous to go modifying the Document instances returned by InstantiatedIndexReader unless you are really sure of what you are doing. The documents are the actual index document instances and not a deserialized clones as when using the Directory implementations. 

If you delete a field of such a document it's gone from the index. If you add a new field not previously known in the index, it will not be in sync with field options and you'll probably see strange side effects when merging with other indices, et c.  

This could also be seen as a great feature that was lost in documentation. One of my favorites is to store the domain object(s) as Fieldable in the document(s) representing them.

I'll add an appropriate comment about this in the javadocs.



> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607828#action_12607828 ] 

Jason Rutherglen commented on LUCENE-1312:
------------------------------------------

I have an exception but it's different and not sure what it's related to.  I need more debug code to see the details, if I can reproduce it.  I am assuming it is related to InstantiatedIndexReader, given it would be difficult to make happen with the regular IndexReader code.  Fails on the last line of the code given.  Probably something to do with InstantiatedIndexReader and deleted docs differing somehow with other data SegmentMerger is obtaining.

Exception:

org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _0: fieldsReader shows 5 but segmentInfo shows 20
	at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:322)
	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:267)
	at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:235)
	at org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:90)
	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:649)
	at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:97)
	at org.apache.lucene.index.IndexReader.open(IndexReader.java:213)
	at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)

Code:
{code}
RAMDirectory ramDirectory = new RAMDirectory();
IndexWriter indexWriter = new IndexWriter(ramDirectory, false, system.getDefaultAnalyzer(), true);
indexWriter.setMergeScheduler(new SerialMergeScheduler());
indexWriter.setUseCompoundFile(true);
indexWriter.addIndexes(indexReaders);
indexWriter.close();
Directory.copy(ramDirectory, directory, true);
initialIndexReader = IndexReader.open(directory, indexDeletionPolicy);
{code}

		

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1312:
-------------------------------------

    Attachment: lucene-1312.patch

lucene-1312.patch

The problem with TermEnum is term() needed: 

if (term == null) return null;

That fixed the issue and next has been removed from InstantiatedTermEnum(InstantiatedIndexReader reader).

Will work on the rest when I have time.

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607686#action_12607686 ] 

Jason Rutherglen commented on LUCENE-1312:
------------------------------------------

The error I was seeing in SegmenMerger was related to how InstantiatedIndex keeps Document in memory, so any editting of the Document after the Document is returned by InstantiatedIndexReader later messes up SegmentMerger.  Tried out the patch and it worked for me.

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Rutherglen updated LUCENE-1312:
-------------------------------------

    Attachment: lucene-1312.patch

lucene-1312.patch

Fixed this bug and one related to termenum with no term.  These made SegmentMerger fail.  

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>         Attachments: lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608466#action_12608466 ] 

Karl Wettin commented on LUCENE-1312:
-------------------------------------

bq. Because the patch looks like a mess with the various changes I copied and pasted the alterations to InstantiatedIndexReader to allow the following code which works. Basically saving a copy of normsByFieldNameAndDocumentNumber into the OceanInstantiatedIndexReader fixes the problem.

I haven't had time to look in to why you need to extend the code, but notice that InstantiatedIndexWriter creates a new byte[] on commit so your copy of the norms will be out of sync with the index unless you also update it at that point. There might be more to this than I can think of right now.

Any other problems with the patch? I'm ready to commit it.

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Assigned: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin reassigned LUCENE-1312:
-----------------------------------

    Assignee: Karl Wettin

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608479#action_12608479 ] 

Karl Wettin commented on LUCENE-1312:
-------------------------------------

bq. Is there an easy way to add a InstantiatedIndexWriter.addIndexes(IndexReader[] readers) method? Seems doable with the InstantiatedIndex(IndexReader sourceIndexReader) constructor however I want to me able to merge another IndexReader in.

It's doable. The simplest solution I can think of is to reconstruct all the documents in one single enumeration of the source index and then add them to the writer. I'm however not certain this is the best way nor if InstantiatedIndexWriter is the place for the code. 

I think it should be discussed in a new issue.

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608477#action_12608477 ] 

Jason Rutherglen commented on LUCENE-1312:
------------------------------------------

The byte[] stuff seems ok.  

Is there an easy way to add a InstantiatedIndexWriter.addIndexes(IndexReader[] readers) method?  Seems doable with the InstantiatedIndex(IndexReader sourceIndexReader) constructor however I want to me able to merge another IndexReader in.  

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607117#action_12607117 ] 

Karl Wettin commented on LUCENE-1312:
-------------------------------------

Hi Jason!

bq. Fixed this bug and one related to termenum with no term. These made SegmentMerger fail.

Can you please supply a test case that demonstrate SegmentMerger failing? Your next() in InstantiatedTermEnum() changes the behaviour InstantiatedIndexReader#terms() compared to IndexReader#terms() and makes the index comparation test to to fail:

{code}
junit.framework.AssertionFailedError: expected:<a:0> but was:<a:1>
	at org.apache.lucene.store.instantiated.TestIndicesEquals.testEquals(TestIndicesEquals.java:244)
{code}

InstantiatedIndex#fieldSettingsByFieldName that getFieldNames(FieldOption) seem to only be updated by InstantiatedIndexWriter and not when populated by InstantiatedIndex(IndexReader).

Can you please supply test cases that demonstrate getFieldNames(FieldOption) works with both index population strategies? 

I think you can factor out the FieldSetting class from InstantiatedIndexWriter as it now is used by InstantiatedIndex and InstantiatedIndexReader too.

bq. A few additional updates related to deleted docs in InstantiatedIndexReader

This looks good. I noticed that TestIndicesEquals does not actually delete any documents and make sure the indices still equals. I can fix that.


Also, please try not to reformat the code, it makes it harder to see the important changes. 

Thanks!

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>         Attachments: lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Closed: (LUCENE-1312) InstantiatedIndexReader does not implement getFieldNames properly

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wettin closed LUCENE-1312.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 2.4
    Lucene Fields: [New, Patch Available]  (was: [New])

Committed in revision 672556.

Thanks Jason!

> InstantiatedIndexReader does not implement getFieldNames properly
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1312
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1312
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>            Assignee: Karl Wettin
>             Fix For: 2.4
>
>         Attachments: lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch, lucene-1312.patch
>
>
> Causes error in org.apache.lucene.index.SegmentMerger.mergeFields

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org