You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/07/24 16:13:52 UTC

[GitHub] [lucene] wuda0112 opened a new pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

wuda0112 opened a new pull request #224:
URL: https://github.com/apache/lucene/pull/224


   Simple text codec add skip list data( include impact) to help understand index format,For debugging, curiosity, transparency only!!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
mikemccand commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-907786967


   > > So awesome that `SimpleText` is finally getting a skipping implementation! Thank you @wuda0112!
   > > Have you tried stressing out the new code by running all Lucene unit-tests with `SimpleText`? Something like `./gradlew -Dtests.codec=SimpleText`.
   > 
   > @mikemccand Thanks for your reply! I have execute `./gradlew check` `./gradlew test` `./gradlew -p lucene/core/ test` `./gradlew -p lucene/codecs/ test`, and all passed!
   > But when i execute `./gradlew test -Dtests.codec=SimpleText`, there has one FAILED, the exception is:
   > 
   > ```
   > org.apache.lucene.demo.TestDemo > testKnnVectorSearch FAILED
   >     java.lang.UnsupportedOperationException
   >         at __randomizedtesting.SeedInfo.seed([2DAF064A760C3554:7F0D5D18D85E7028]:0)
   >         at org.apache.lucene.codecs.simpletext.SimpleTextKnnVectorsReader.search(SimpleTextKnnVectorsReader.java:143)
   > ```
   > 
   > when i look into SimpleTextKnnVectorsReader#search, it `throw new UnsupportedOperationException`
   
   Ahh it looks like `SimpleText` is missing its KNN implementation but looks like it's coming soon!  https://issues.apache.org/jira/browse/LUCENE-10063.  In the meantime that failing test should add an `assume` that the current codec is not `SimpleText`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r683273514



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       it may not happen, so i have modified nothing, but it passed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-904414377


   I reviewed the change more deeply today and found a few minor issues that I pushed fixes for, feel free to let me know if they don't work for you! One major issue that I found however is that binary content gets written to the file because of `MultiLevelSkipListWriter#writeSkip` which uses `writeVLong`. Can you look into writing data in a text format instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r683081295



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       @sonatype-lift ignore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-905487573


   This looks good to me. Since we don't need to encode these numbers differently maybe we could have a single abstract method, e.g.
   
   MultiLevelSkipListWriter
   
   ```java
    /**
     * Write a small positive long on a variable number of bytes.
     *
     * @param l the long to write
     * @param out the output to write to
     */
    protected void writeVLong(long l, DataOutput out) throws IOException{
      out.writeVLong(l);
    }
   ```
   
   MultiLevelSkipListReader
   
   ```java
     /**
      * Read a long written via {@link MultiLevelSkipListWriter#writeVLong}.
      *
      * @param in the IndexInput to read from
      */
     protected long readVLong(IndexInput in) throws IOException {
       return in.readVLong();
     }
   ```
   
   This is a small enough change to an internal class that I don't feel the need to open a separate issue, we could do it on this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-891702010


   > > I have run ./gradlew -p lucene/codecs/ test
   > 
   > We have codec randomization in place that may use the SimpleText codec in any test. Can you confirm that tests pass if you run on the entire source folder?
   
   this time i have run  ./gradlew -p lucene/codecs/ test on the entire source folder and successed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-907810690


   > > > So awesome that `SimpleText` is finally getting a skipping implementation! Thank you @wuda0112!
   > > > Have you tried stressing out the new code by running all Lucene unit-tests with `SimpleText`? Something like `./gradlew -Dtests.codec=SimpleText`.
   > > 
   > > 
   > > @mikemccand Thanks for your reply! I have execute `./gradlew check` `./gradlew test` `./gradlew -p lucene/core/ test` `./gradlew -p lucene/codecs/ test`, and all passed!
   > > But when i execute `./gradlew test -Dtests.codec=SimpleText`, there has one FAILED, the exception is:
   > > ```
   > > org.apache.lucene.demo.TestDemo > testKnnVectorSearch FAILED
   > >     java.lang.UnsupportedOperationException
   > >         at __randomizedtesting.SeedInfo.seed([2DAF064A760C3554:7F0D5D18D85E7028]:0)
   > >         at org.apache.lucene.codecs.simpletext.SimpleTextKnnVectorsReader.search(SimpleTextKnnVectorsReader.java:143)
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > when i look into SimpleTextKnnVectorsReader#search, it `throw new UnsupportedOperationException`
   > 
   > Ahh it looks like `SimpleText` is missing its KNN implementation but looks like it's coming soon! https://issues.apache.org/jira/browse/LUCENE-10063. In the meantime that failing test should add an `assume` that the current codec is not `SimpleText`.
   
   Ok, got it, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-908326788


   > In the meantime that failing test should add an assume that the current codec is not SimpleText.
   
   Or let's just merge https://github.com/apache/lucene/pull/262? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-906451844


   Ok , got it. I will keep work on it.  Thanks for your patience


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz merged pull request #224:
URL: https://github.com/apache/lucene/pull/224


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-906442903


   Ah, I see what you mean. Let's ignore my suggestion then and go with your initial suggestion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r683081295



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       @sonatype-lift ignore




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r694488864



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsReader.java
##########
@@ -357,19 +380,55 @@ public int nextDoc() throws IOException {
       }
     }
 
+    private int advanceTarget(int target) throws IOException {
+      if (seekTo > 0) {
+        in.seek(seekTo);
+        seekTo = -1;
+      }
+      assert docID() < target;
+      int doc;
+      do {
+        doc = readDoc();
+      } while (doc < target);
+      return doc;
+    }
+
     @Override
     public int advance(int target) throws IOException {
-      // Naive -- better to index skip data
-      return slowAdvance(target);
+      advanceShallow(target);
+      return advanceTarget(target);
     }
 
     @Override
     public long cost() {
       return cost;
     }
+
+    @Override
+    public void advanceShallow(int target) throws IOException {
+      if (skipReader.hasSkipList()) {
+        if (target > nextSkipDoc) {
+          int numSkipped = skipReader.skipTo(target) + 1;
+          if (numSkipped > lastNumSkipped) {

Review comment:
       i just want to express "this time we have skipped docs", but it is redundant,unnecessary




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-904729746


   Let's give this a try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-907748573


   > So awesome that `SimpleText` is finally getting a skipping implementation! Thank you @wuda0112!
   > 
   > Have you tried stressing out the new code by running all Lucene unit-tests with `SimpleText`? Something like `./gradlew -Dtests.codec=SimpleText`.
   
   @mikemccand  Thanks for your reply! I have execute `./gradlew check` `./gradlew test` `./gradlew -p lucene/core/ test` `./gradlew -p lucene/codecs/ test`, and all passed!
   But when i execute `./gradlew test -Dtests.codec=SimpleText`, there has one FAILED, the exception is:
   ```
   org.apache.lucene.demo.TestDemo > testKnnVectorSearch FAILED
       java.lang.UnsupportedOperationException
           at __randomizedtesting.SeedInfo.seed([2DAF064A760C3554:7F0D5D18D85E7028]:0)
           at org.apache.lucene.codecs.simpletext.SimpleTextKnnVectorsReader.search(SimpleTextKnnVectorsReader.java:143)
   ```
   when i look into SimpleTextKnnVectorsReader#search, it `throw new UnsupportedOperationException`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r694170097



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsReader.java
##########
@@ -357,19 +380,55 @@ public int nextDoc() throws IOException {
       }
     }
 
+    private int advanceTarget(int target) throws IOException {
+      if (seekTo > 0) {
+        in.seek(seekTo);
+        seekTo = -1;
+      }
+      assert docID() < target;
+      int doc;
+      do {
+        doc = readDoc();
+      } while (doc < target);
+      return doc;
+    }
+
     @Override
     public int advance(int target) throws IOException {
-      // Naive -- better to index skip data
-      return slowAdvance(target);
+      advanceShallow(target);
+      return advanceTarget(target);
     }
 
     @Override
     public long cost() {
       return cost;
     }
+
+    @Override
+    public void advanceShallow(int target) throws IOException {
+      if (skipReader.hasSkipList()) {

Review comment:
       Do we actually need to do this, since we seem to be already protecting against short postings lists that don't have skip data above?
   
   ```java
         if (docFreq <= SimpleTextSkipWriter.BLOCK_SIZE) {
           // no skip data
           return new SlowImpactsEnum(postings(null, flags));
         }
   ```

##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsReader.java
##########
@@ -357,19 +380,55 @@ public int nextDoc() throws IOException {
       }
     }
 
+    private int advanceTarget(int target) throws IOException {
+      if (seekTo > 0) {
+        in.seek(seekTo);
+        seekTo = -1;
+      }
+      assert docID() < target;
+      int doc;
+      do {
+        doc = readDoc();
+      } while (doc < target);
+      return doc;
+    }
+
     @Override
     public int advance(int target) throws IOException {
-      // Naive -- better to index skip data
-      return slowAdvance(target);
+      advanceShallow(target);
+      return advanceTarget(target);
     }
 
     @Override
     public long cost() {
       return cost;
     }
+
+    @Override
+    public void advanceShallow(int target) throws IOException {
+      if (skipReader.hasSkipList()) {
+        if (target > nextSkipDoc) {
+          int numSkipped = skipReader.skipTo(target) + 1;
+          if (numSkipped > lastNumSkipped) {

Review comment:
       I don't understand this part, why do we need to track lastNumSkipped?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r694488494



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsReader.java
##########
@@ -357,19 +380,55 @@ public int nextDoc() throws IOException {
       }
     }
 
+    private int advanceTarget(int target) throws IOException {
+      if (seekTo > 0) {
+        in.seek(seekTo);
+        seekTo = -1;
+      }
+      assert docID() < target;
+      int doc;
+      do {
+        doc = readDoc();
+      } while (doc < target);
+      return doc;
+    }
+
     @Override
     public int advance(int target) throws IOException {
-      // Naive -- better to index skip data
-      return slowAdvance(target);
+      advanceShallow(target);
+      return advanceTarget(target);
     }
 
     @Override
     public long cost() {
       return cost;
     }
+
+    @Override
+    public void advanceShallow(int target) throws IOException {
+      if (skipReader.hasSkipList()) {

Review comment:
       Yes, it is unnecessary. I just want to keep safety invoke, and it seems a little bit readable(just my thought)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-908356737


   @jpountz Thank you, you helped me a lot,  and thanks for your patience to review !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r678249370



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextSkipReader.java
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.simpletext;
+
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.FREQ;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.IMPACT;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.IMPACTS;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.IMPACTS_END;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.NORM;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.SKIP_DOC;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.SKIP_DOC_FP;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.SKIP_LIST;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import org.apache.lucene.codecs.MultiLevelSkipListReader;
+import org.apache.lucene.index.Impact;
+import org.apache.lucene.index.Impacts;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.store.BufferedChecksumIndexInput;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRefBuilder;
+import org.apache.lucene.util.CharsRefBuilder;
+import org.apache.lucene.util.StringHelper;
+
+/**
+ * This class reads skip lists with multiple levels.
+ *
+ * <p>See {@link SimpleTextFieldsWriter} for the information about the encoding of the multi level
+ * skip lists.
+ *
+ * @lucene.experimental
+ */
+public class SimpleTextSkipReader extends MultiLevelSkipListReader {

Review comment:
       can it be made pkg-private?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz merged pull request #224:
URL: https://github.com/apache/lucene/pull/224


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-906934607


   Now, there has no binary content gets written to the file !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 edited a comment on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 edited a comment on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-905393867


   > Let's give this a try.
   
   I tried to define the methods, but i am not sure is it fit,  if this correct, i will create a new issuse, and if this is your concern issuse, could you please defind it, because you are more professional!
   
   MultiLevelSkipListWriter
   
    ```
   /**
      * Writes the length of a level to the given output.
      *
      * @param levelLength the length of a level
      * @param output the IndexOutput the length shall be written to
      */
     protected void writeLevelLength(long levelLength,IndexOutput output) throws IOException{
       output.writeVLong(levelLength);
     }
   
     /**
      * Writes the child pointer of a block to the given output.
      *
      * @param childPointer block of higher level point to the lower level
      * @param skipBuffer the skip buffer to write to
      */
     protected void writeChildPointer(long childPointer, DataOutput skipBuffer) throws IOException{
       skipBuffer.writeVLong(childPointer);
     }
   ```
   
   
   MultiLevelSkipListReader
   
   ```
   /**
      * read the length of the current level.
      *
      * @param skipStream the IndexInput the length shall be read from
      * @return level length
      */
     protected long readLevelLength(IndexInput skipStream) throws IOException {
       return skipStream.readVLong();
     }
   
     /**
      * read the child pointer.
      *
      * @param skipStream the IndexInput the child pointer shall be read from
      * @return child pointer
      */
     protected long readChildPointer(IndexInput skipStream) throws IOException {
       return skipStream.readVLong();
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-905393867


   > Let's give this a try.
   
   I tried to define the methods, but i am not sure is it fit,  if this correct, i will create a new issuse, and if this is your concern issuse, could you please defind it, because you are more professional!
   
   MultiLevelSkipListWriter
   `/**
      * Writes the length of a level to the given output.
      *
      * @param levelLength the length of a level
      * @param output the IndexOutput the length shall be written to
      */
     protected void writeLevelLength(long levelLength,IndexOutput output) throws IOException{
       output.writeVLong(levelLength);
     }
   
     /**
      * Writes the child pointer of a block to the given output.
      *
      * @param childPointer block of higher level point to the lower level
      * @param skipBuffer the skip buffer to write to
      */
     protected void writeChildPointer(long childPointer, DataOutput skipBuffer) throws IOException{
       skipBuffer.writeVLong(childPointer);
     }`
   
   MultiLevelSkipListReader
   `/**
      * read the length of the current level.
      *
      * @param skipStream the IndexInput the length shall be read from
      * @return level length
      */
     protected long readLevelLength(IndexInput skipStream) throws IOException {
       return skipStream.readVLong();
     }
   
     /**
      * read the child pointer.
      *
      * @param skipStream the IndexInput the child pointer shall be read from
      * @return child pointer
      */
     protected long readChildPointer(IndexInput skipStream) throws IOException {
       return skipStream.readVLong();
     }`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
mikemccand commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-907641298


   So awesome that `SimpleText` is finally getting a skipping implementation!  Thank you @wuda0112!
   
   Have you tried stressing out the new code by running all Lucene unit-tests with `SimpleText`?  Something like `./gradlew -Dtests.codec=SimpleText`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-895833494


   > @wuda0112 FYI I'm currently away from computers so I most likely won't have the opportunity to merge this PR until 2 weeks from now. If someone else feels good about merging it before, please go ahead but otherwise I'll come back to it in two weeks.
   
   @jpountz  Thanks for your reply, no problem, you can take a look when you have time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] sonatype-lift[bot] commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
sonatype-lift[bot] commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r682733021



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       *NULL_DEREFERENCE:*  object `fieldInfo` last assigned on line 82 could be null and is dereferenced at line 88.
   (at-me [in a reply](https://help.sonatype.com/lift) with `help` or `ignore`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r695896655



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextSkipReader.java
##########
@@ -147,28 +147,6 @@ protected int readSkipData(int level, IndexInput skipStream) throws IOException
     return skipDoc;
   }
 
-  // NOTE: This method scans the entire postings list to find skip lists.

Review comment:
       Fantastic!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-906430050


   > This looks good to me. Since we don't need to encode these numbers differently maybe we could have a single abstract method, e.g.
   > 
   > MultiLevelSkipListWriter
   > 
   > ```java
   >  /**
   >   * Write a small positive long on a variable number of bytes.
   >   *
   >   * @param l the long to write
   >   * @param out the output to write to
   >   */
   >  protected void writeVLong(long l, DataOutput out) throws IOException{
   >    out.writeVLong(l);
   >  }
   > ```
   > 
   > MultiLevelSkipListReader
   > 
   > ```java
   >   /**
   >    * Read a long written via {@link MultiLevelSkipListWriter#writeVLong}.
   >    *
   >    * @param in the IndexInput to read from
   >    */
   >   protected long readVLong(IndexInput in) throws IOException {
   >     return in.readVLong();
   >   }
   > ```
   > 
   > This is a small enough change to an internal class that I don't feel the need to open a separate issue, we could do it on this PR.
   
   There has one thing i want to confirm, since subclass do not know the numbers is writing stands for(eg. for level length or child pointer), so subclass can not add modifier and hierarchy befor the numbers, so the file may be looks like case 1:
   
   Case 1: no hierarchy and modifier
   ```
   field title
   	term lucene
   		doc 499
   		  freq 1
   		  pos 3
   		skipList 
   1024
   		  level 2
   			skipDoc 218
   			skipDocFP 15469
   			impacts 
   			  impact 
   				freq 1
   				norm 1
   			impactsEnd 
   64
   ```
   
   Case 2: we may expect
   ```
   field title
   	term lucene
   		doc 499
   		  freq 1
   		  pos 3
   		skipList 
   		  1024
   		  level 2
   			skipDoc 218
   			skipDocFP 15469
   			impacts 
   			  impact 
   				freq 1
   				norm 1
   			impactsEnd 
                           childPointer 64
   ```
   Of course, **it is completely no need to change API to unsuitable state and it may be just only for this PR**, so for me both case 1 and case 2 are ok! What is your opinion, and case 1 is ok?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r695867514



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextSkipReader.java
##########
@@ -147,28 +147,6 @@ protected int readSkipData(int level, IndexInput skipStream) throws IOException
     return skipDoc;
   }
 
-  // NOTE: This method scans the entire postings list to find skip lists.

Review comment:
       I have use FST stores skipPointer, so when locating skip lists there is no need to scan over postings




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-895817986


   @wuda0112 FYI I'm currently away from computers so I most likely won't have the opportunity to merge this PR until 2 weeks from now. If someone else feels good about merging it before, please go ahead but otherwise I'll come back to it in two weeks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
jpountz commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-908326788


   > In the meantime that failing test should add an assume that the current codec is not SimpleText.
   
   Or let's just merge https://github.com/apache/lucene/pull/262? :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r683081295



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       @sonatype-lift ignore

##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       @sonatype-lift ignore

##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       it may not happen, so i have modified nothing, but it passed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r678424267



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextSkipReader.java
##########
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.codecs.simpletext;
+
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.FREQ;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.IMPACT;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.IMPACTS;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.IMPACTS_END;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.NORM;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.SKIP_DOC;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.SKIP_DOC_FP;
+import static org.apache.lucene.codecs.simpletext.SimpleTextSkipWriter.SKIP_LIST;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import org.apache.lucene.codecs.MultiLevelSkipListReader;
+import org.apache.lucene.index.Impact;
+import org.apache.lucene.index.Impacts;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.store.BufferedChecksumIndexInput;
+import org.apache.lucene.store.ChecksumIndexInput;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.BytesRefBuilder;
+import org.apache.lucene.util.CharsRefBuilder;
+import org.apache.lucene.util.StringHelper;
+
+/**
+ * This class reads skip lists with multiple levels.
+ *
+ * <p>See {@link SimpleTextFieldsWriter} for the information about the encoding of the multi level
+ * skip lists.
+ *
+ * @lucene.experimental
+ */
+public class SimpleTextSkipReader extends MultiLevelSkipListReader {

Review comment:
       sorry i have ran at wrong git branch, so it passed, when i realized that, i convert to draft, i will test it carefully again until satisfy all unit test. 
   
   and yes , it should be pkg-private. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] sonatype-lift[bot] commented on a change in pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
sonatype-lift[bot] commented on a change in pull request #224:
URL: https://github.com/apache/lucene/pull/224#discussion_r682733021



##########
File path: lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextFieldsWriter.java
##########
@@ -78,6 +87,12 @@ public void write(FieldInfos fieldInfos, Fields fields) throws IOException {
       boolean hasFreqs = terms.hasFreqs();
       boolean hasPayloads = fieldInfo.hasPayloads();

Review comment:
       *NULL_DEREFERENCE:*  object `fieldInfo` last assigned on line 82 could be null and is dereferenced at line 88.
   (at-me [in a reply](https://help.sonatype.com/lift) with `help` or `ignore`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-904685835


   > I reviewed the change more deeply today and found a few minor issues that I pushed fixes for, feel free to let me know if they don't work for you! One major issue that I found however is that binary content gets written to the file because of `MultiLevelSkipListWriter#writeSkip` which uses `writeVLong`. Can you look into writing data in a text format instead?
   
   
   MultiLevelSkipListWriter will write length uses writeVLong for every level, and will write childPointer for all blocks except the blocks in the lowest level uses writeVLong, can we define two methods to write these values, so that subclass can overwrite, but the default implement is writeVlong. And at the same time, MultiLevelSkipListReader also define two corresponding methods to read that values. is that ok?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-908356737


   @jpountz Thank you, you helped me a lot,  and thanks for your patience to review !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] wuda0112 commented on pull request #224: LUCENE-10035: Simple text codec add multi level skip list data

Posted by GitBox <gi...@apache.org>.
wuda0112 commented on pull request #224:
URL: https://github.com/apache/lucene/pull/224#issuecomment-904431190


   > I reviewed the change more deeply today and found a few minor issues that I pushed fixes for, feel free to let me know if they don't work for you! One major issue that I found however is that binary content gets written to the file because of `MultiLevelSkipListWriter#writeSkip` which uses `writeVLong`. Can you look into writing data in a text format instead?
   
   Ok, i will keep tracking, and if there has some issuse, i will tell you. I have thought about Write text content instead of binary content before, for some reason, i had given up, but if it is necessary, i will look into and try to  fix it !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org