You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/09/29 14:00:27 UTC

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1925: Cleanup DWPT state handling

mikemccand commented on a change in pull request #1925:
URL: https://github.com/apache/lucene-solr/pull/1925#discussion_r496729477



##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##########
@@ -157,49 +157,42 @@ private boolean updatePeaks(long delta) {
   }
 
   DocumentsWriterPerThread doAfterDocument(DocumentsWriterPerThread perThread, boolean isUpdate) {
-    final long delta = perThread.getCommitLastBytesUsedDelta();
+    final long delta = perThread.commitLastBytesUsed();
     synchronized (this) {
-      // we need to commit this under lock but calculate it outside of the lock to minimize the time this lock is held
-      // per document. The reason we update this under lock is that we mark DWPTs as pending without acquiring it's
-      // lock in #setFlushPending and this also reads the committed bytes and modifies the flush/activeBytes.
-      // In the future we can clean this up to be more intuitive.

Review comment:
       ;)

##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriter.java
##########
@@ -446,7 +446,7 @@ long updateDocuments(final Iterable<? extends Iterable<? extends IndexableField>
   private boolean doFlush(DocumentsWriterPerThread flushingDWPT) throws IOException {
     boolean hasEvents = false;
     while (flushingDWPT != null) {
-      assert flushingDWPT.hasFlushed() == false;
+      assert flushingDWPT.getState() == DocumentsWriterPerThread.State.FLUSHING : "expected FLUSHING but was: " + flushingDWPT.getState();

Review comment:
       Maybe add a `boolean assertState(State)` to DWPT that does the above `assert` and then returns `true` so we can call it under `assert`?  Only downside is then we have double looking `assert`, i.e. `assert flushingDWPT.assertState(State.FLUSHING)`.  Looks like we are doing this in at least three places here ...

##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##########
@@ -157,49 +157,42 @@ private boolean updatePeaks(long delta) {
   }
 
   DocumentsWriterPerThread doAfterDocument(DocumentsWriterPerThread perThread, boolean isUpdate) {
-    final long delta = perThread.getCommitLastBytesUsedDelta();
+    final long delta = perThread.commitLastBytesUsed();
     synchronized (this) {
-      // we need to commit this under lock but calculate it outside of the lock to minimize the time this lock is held
-      // per document. The reason we update this under lock is that we mark DWPTs as pending without acquiring it's
-      // lock in #setFlushPending and this also reads the committed bytes and modifies the flush/activeBytes.
-      // In the future we can clean this up to be more intuitive.
-      perThread.commitLastBytesUsed(delta);
       try {
         /*
          * We need to differentiate here if we are pending since setFlushPending
          * moves the perThread memory to the flushBytes and we could be set to
          * pending during a delete
          */
-        if (perThread.isFlushPending()) {
-          flushBytes += delta;
-          assert updatePeaks(delta);
-        } else {
-          activeBytes += delta;
-          assert updatePeaks(delta);
+        activeBytes += delta;

Review comment:
       Hmm, why are we removing the conditional on "flush pending" here (and always updating `activeBytes`, never `flushBytes`)?  I guess `setFlushPending` will move all `activeBytes` over to `flushBytes`, so it is fine for us to always increment `activeBytes` here?
   
   Edit: hmm it looks like we are also removing `setFlushPending`'s moving of `activeBytes` to `flushBytes` ;)  So now I don't quite understand how we are changing the RAM accounting here.
   
   Edit again!: OK I see, we moved the RAM shifting into `checkoutFlushableWriter`, OK good!
   
   Should we fix (or maybe just remove) the above comment now?

##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java
##########
@@ -295,27 +289,32 @@ public synchronized void waitForFlush() {
    * {@link DocumentsWriterPerThread} must have indexed at least on Document and must not be
    * already pending.
    */
-  public synchronized void setFlushPending(DocumentsWriterPerThread perThread) {
-    assert !perThread.isFlushPending();
+  synchronized void setFlushPending(DocumentsWriterPerThread perThread) {
+    assert !perThread.isFlushPending() : "state: " + perThread.getState();
     if (perThread.getNumDocsInRAM() > 0) {
-      perThread.setFlushPending(); // write access synced
-      final long bytes = perThread.getLastCommittedBytesUsed();
-      flushBytes += bytes;
-      activeBytes -= bytes;

Review comment:
       Hmm we are no longer moving `bytes` from `activeBytes` to `flushedBytes` here?  Are we (the caller here?) doing that elsewhere?

##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##########
@@ -513,48 +515,21 @@ public String toString() {
         + numDocsInRAM + ", deleteQueue=" + deleteQueue + ", " + numDeletedDocIds + " deleted docIds" + "]";
   }
 
-
   /**
    * Returns true iff this DWPT is marked as flush pending
    */
   boolean isFlushPending() {
-    return flushPending.get() == Boolean.TRUE;
-  }
-
-  /**
-   * Sets this DWPT as flush pending. This can only be set once.
-   */
-  void setFlushPending() {
-    flushPending.set(Boolean.TRUE);
-  }
-
-
-  /**
-   * Returns the last committed bytes for this DWPT. This method can be called
-   * without acquiring the DWPTs lock.
-   */
-  long getLastCommittedBytesUsed() {
-    return lastCommittedBytesUsed;
-  }
-
-  /**
-   * Commits the current {@link #ramBytesUsed()} and stores it's value for later reuse.
-   * The last committed bytes used can be retrieved via {@link #getLastCommittedBytesUsed()}
-   */
-  void commitLastBytesUsed(long delta) {
-    assert isHeldByCurrentThread();
-    assert getCommitLastBytesUsedDelta() == delta : "delta has changed";
-    lastCommittedBytesUsed += delta;
+    return state == State.FLUSH_PENDING;

Review comment:
       Maybe we should remove `isFlushPending` method entirely and expect callers to check `state == State.FLUSH_PENDING` themselves?

##########
File path: lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java
##########
@@ -593,9 +568,61 @@ void unlock() {
   }
 
   /**
-   * Returns <code>true</code> iff this DWPT has been flushed
+   * Returns the DWPTs current state.
    */
-  boolean hasFlushed() {
-    return hasFlushed.get() == Boolean.TRUE;
+  State getState() {
+    return state;
   }
+
+  /**
+   * Transitions the DWPT to the given state of fails if the transition is invalid.

Review comment:
       s/`of`/`or`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org