You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/25 12:14:46 UTC
[GitHub] [lucene-solr] mikemccand commented on a change in pull request #2345: Benchmark custom

mikemccand commented on a change in pull request #2345:
URL: https://github.com/apache/lucene-solr/pull/2345#discussion_r581398351



##########
File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {
+     * if (nextFile >= inputFiles.size()) { // exhausted files, start a new round, unless forever set to false.
+     * if (!forever) {
+     *    throw new NoMoreDataException();
+     * }
+     * nextFile = 0;
+     * iteration++;
+     * }
+     * f = inputFiles.get(nextFile++);
+     * name = f.toRealPath() + "_" +iteration;
+     * }
+     */
+    if (!threadIndexCreated) {

Review comment:
       `if (threadIndexCreated == false) {` instead (to reduce chance of accidental future refactoring bugs)?  This likely won't pass our code style checker (`gradle precommit`).

##########
File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {
+     * if (nextFile >= inputFiles.size()) { // exhausted files, start a new round, unless forever set to false.
+     * if (!forever) {
+     *    throw new NoMoreDataException();
+     * }
+     * nextFile = 0;
+     * iteration++;
+     * }
+     * f = inputFiles.get(nextFile++);
+     * name = f.toRealPath() + "_" +iteration;
+     * }
+     */
+    if (!threadIndexCreated) {
+      createThreadIndex();
+    }
+
+    int index = (int) Thread.currentThread().getId() % threadIndex.length;
+    int fIndex = index + threadIndex[index] * threadIndex.length;
+    threadIndex[index]++;
+
+    // Sanity check, if # threads is greater than # input files, wrap index
+    if (index >= inputFilesSize) index %= inputFilesSize;

Review comment:
       Can you move the `index %= inputFilesSize` to newline and inside `{ ... }` body?

##########
File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {

Review comment:
       Just delete this old code?  You are replacing it with a more concurrent version, yay!

##########
File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -146,4 +172,11 @@ public synchronized void resetInputs() throws IOException {
     nextFile = 0;
     iteration = 0;
   }
+
+  private synchronized void createThreadIndex() {
+    if (!threadIndexCreated) {

Review comment:
       `== false` instead?  Or maybe change to `assert threadIndexCreated == false` since you also check this up above with a real `if` already?

##########
File path: lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/ReutersContentSource.java
##########
@@ -102,19 +104,43 @@ public void close() throws IOException {
   public DocData getNextDocData(DocData docData) throws NoMoreDataException, IOException {
     Path f = null;
     String name = null;
-    synchronized (this) {
-      if (nextFile >= inputFiles.size()) {
-        // exhausted files, start a new round, unless forever set to false.
-        if (!forever) {
-          throw new NoMoreDataException();
-        }
-        nextFile = 0;
-        iteration++;
-      }
-      f = inputFiles.get(nextFile++);
-      name = f.toRealPath() + "_" + iteration;
+    int inputFilesSize = inputFiles.size();
+
+    /*
+     * synchronized (this) {
+     * if (nextFile >= inputFiles.size()) { // exhausted files, start a new round, unless forever set to false.
+     * if (!forever) {
+     *    throw new NoMoreDataException();
+     * }
+     * nextFile = 0;
+     * iteration++;
+     * }
+     * f = inputFiles.get(nextFile++);
+     * name = f.toRealPath() + "_" +iteration;
+     * }
+     */
+    if (!threadIndexCreated) {
+      createThreadIndex();
+    }
+
+    int index = (int) Thread.currentThread().getId() % threadIndex.length;
+    int fIndex = index + threadIndex[index] * threadIndex.length;
+    threadIndex[index]++;

Review comment:
       I'm confused how this approach ensures that we will indeed index every document in the `inputFiles`?
   
   `Thread.currentThread().getId() % threadIndex.length` is not guaranteed to reach every possible int from `0 .. threadIndex.length`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org