You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by s1monw <gi...@git.apache.org> on 2018/12/06 10:36:16 UTC

[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

GitHub user s1monw opened a pull request:

    https://github.com/apache/lucene-solr/pull/516

    LUCENE-8594: DV update are broken for updates on new field

    A segmemnt written with Lucene70Codec failes if it ties to update
    a DV field that didn't exist in the index before it was upgraded to
    Lucene80Codec. We bake the DV format into the FieldInfo when it's used
    the first time and therefor never go to the codec if we need to update.
    yet on a field that didn't exist before and was added during an indexing
    operation we have to consult the coded and get an exception.
    This change fixes this issue and adds the relevant bwc tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/s1monw/lucene-solr fix_dv_update_on_old_index

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #516
    
----
commit 45bb0938105cae6aaaa62d04f757e1cfe70ecb76
Author: Simon Willnauer <si...@...>
Date:   2018-12-06T10:31:02Z

    LUCENE-8594: DV update are broken for updates on new field
    
    A segmemnt written with Lucene70Codec failes if it ties to update
    a DV field that didn't exist in the index before it was upgraded to
    Lucene80Codec. We bake the DV format into the FieldInfo when it's used
    the first time and therefor never go to the codec if we need to update.
    yet on a field that didn't exist before and was added during an indexing
    operation we have to consult the coded and get an exception.
    This change fixes this issue and adds the relevant bwc tests.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by jpountz <gi...@git.apache.org>.
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239448053
  
    --- Diff: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java ---
    @@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String field) {
       private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() {
         @Override
         public DocValuesFormat getDocValuesFormatForField(String field) {
    -      throw new IllegalStateException("This codec should only be used for reading, not writing");
    +      return defaultDVFormat;
    --- End diff --
    
    I don't think it's a requirement, but I like using the old format better:
     - it's probably easier to debug, as all fields would be using the same format
     - given that PFDVFormat creates different files per format, forcing the new format would increase the number of files for old segments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #516: LUCENE-8594: DV update are broken for updates on new...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on the issue:

    https://github.com/apache/lucene-solr/pull/516
  
    @jpountz I pushed changes


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr issue #516: LUCENE-8594: DV update are broken for updates on new...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on the issue:

    https://github.com/apache/lucene-solr/pull/516
  
    this was found in https://github.com/elastic/elasticsearch/pull/36286


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by jpountz <gi...@git.apache.org>.
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239408003
  
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    --- End diff --
    
    s/Callable/Runnable/ since you don't care about the return value/


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239408275
  
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    --- End diff --
    
    it's throws an exception hence the callable


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/lucene-solr/pull/516


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239409024
  
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    +      boolean found = false;
    +      try (DirectoryReader reader = DirectoryReader.open(dir)) {
    +        for (LeafReaderContext ctx : reader.leaves()) {
    +          LeafReader leafReader = ctx.reader();
    +          TermsEnum id = leafReader.terms("id").iterator();
    +          if (id.seekExact(new BytesRef("1"))) {
    +            PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
    +            NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric");
    +            BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary");
    +            int doc;
    +            while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
    +              found = true;
    +              assertEquals(doc, binaryDocValues.nextDoc());
    +              assertEquals(doc, numericDocValues.nextDoc());
    +              assertEquals(1, numericDocValues.longValue());
    +              assertEquals(toBytes(1), binaryDocValues.binaryValue());
    +            }
    +          }
    --- End diff --
    
    well this doesn't work since we have an other doc with a value in this field I guess and it might be merged into a single segment. which we also force


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by s1monw <gi...@git.apache.org>.
Github user s1monw commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239408501
  
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    +      boolean found = false;
    +      try (DirectoryReader reader = DirectoryReader.open(dir)) {
    +        for (LeafReaderContext ctx : reader.leaves()) {
    +          LeafReader leafReader = ctx.reader();
    +          TermsEnum id = leafReader.terms("id").iterator();
    +          if (id.seekExact(new BytesRef("1"))) {
    +            PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
    +            NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric");
    +            BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary");
    +            int doc;
    +            while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
    +              found = true;
    +              assertEquals(doc, binaryDocValues.nextDoc());
    +              assertEquals(doc, numericDocValues.nextDoc());
    +              assertEquals(1, numericDocValues.longValue());
    +              assertEquals(toBytes(1), binaryDocValues.binaryValue());
    +            }
    +          }
    --- End diff --
    
    👍 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by jpountz <gi...@git.apache.org>.
Github user jpountz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239407845
  
    --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ---
    @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
         writer.close();
         dir.close();
       }
    +
    +  public void testSoftDeletes() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    int maxDoc = writer.maxDoc();
    +    writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1));
    +
    +    if (random().nextBoolean()) {
    +      writer.commit();
    +    }
    +    writer.forceMerge(1);
    +    writer.commit();
    +    assertEquals(maxDoc-1, writer.maxDoc());
    +    writer.close();
    +    dir.close();
    +  }
    +
    +  public void testDocValuesUpdatesWithNewField() throws Exception {
    +    Path oldIndexDir = createTempDir("dvupdates");
    +    TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
    +    Directory dir = newFSDirectory(oldIndexDir);
    +    verifyUsesDefaultCodec(dir, dvUpdatesIndex);
    +
    +    // update fields and verify index
    +    IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random()));
    +    IndexWriter writer = new IndexWriter(dir, conf);
    +    // introduce a new field that we later update
    +    writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO),
    +        new NumericDocValuesField("new_numeric", 1),
    +        new BinaryDocValuesField("new_binary", toBytes(1))));
    +    writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
    +    writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1));
    +
    +    writer.commit();
    +    Callable assertDV = () -> {
    +      boolean found = false;
    +      try (DirectoryReader reader = DirectoryReader.open(dir)) {
    +        for (LeafReaderContext ctx : reader.leaves()) {
    +          LeafReader leafReader = ctx.reader();
    +          TermsEnum id = leafReader.terms("id").iterator();
    +          if (id.seekExact(new BytesRef("1"))) {
    +            PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
    +            NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric");
    +            BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary");
    +            int doc;
    +            while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
    +              found = true;
    +              assertEquals(doc, binaryDocValues.nextDoc());
    +              assertEquals(doc, numericDocValues.nextDoc());
    +              assertEquals(1, numericDocValues.longValue());
    +              assertEquals(toBytes(1), binaryDocValues.binaryValue());
    +            }
    +          }
    --- End diff --
    
    for completeness, add `assertEquals(DocIdSetIterator.NO_MORE_DOCS, binaryDocValues.nextDoc()); assertEquals(DocIdSetIterator.NO_MORE_DOCS, numericDocValues.nextDoc());`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

Posted by mikemccand <gi...@git.apache.org>.
Github user mikemccand commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/516#discussion_r239426275
  
    --- Diff: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java ---
    @@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String field) {
       private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() {
         @Override
         public DocValuesFormat getDocValuesFormatForField(String field) {
    -      throw new IllegalStateException("This codec should only be used for reading, not writing");
    +      return defaultDVFormat;
    --- End diff --
    
    Ahh this is because the DV updates must also be written with the old codec?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org