You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/28 17:39:09 UTC

[GitHub] [lucene] zhaih commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

zhaih commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r982692228


##########
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();

Review Comment:
   This real time counter could potentially harm the performance since it is increased every byte you write for every thread.
   
   Since the IndexOutput is already not thread-safe, we probably can let `ByteTrackingIndexOutput` keep a private (volatile?) counter of bytes, and expose a public getter. Then in this directory wrapper keep a list of tracking output so that when we need we can just call the getter methods and sum up the numbers. Also to avoid tracking too much index output we can deregister them when they're closed.
   
   It's just another idea and could be a follow-up, but I think we need to be careful about using the atomics.



##########
lucene/misc/src/java/org/apache/lucene/misc/store/ByteWritesTrackingDirectoryWrapper.java:
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.misc.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.store.FilterDirectory;
+import org.apache.lucene.store.IOContext;
+import org.apache.lucene.store.IndexOutput;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class ByteWritesTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+  private final AtomicLong realTimeFlushedBytes = new AtomicLong();
+  private final AtomicLong realTimeMergedBytes = new AtomicLong();
+
+  public final boolean trackTempOutput;
+
+  /**
+   * Constructor defaults to not tracking temp outputs
+   *
+   * @param in input Directory
+   */
+  public ByteWritesTrackingDirectoryWrapper(Directory in) {
+    this(in, false);
+  }
+
+  /**
+   * Constructor with option to track tempOutput
+   *
+   * @param in input Directory
+   * @param trackTempOutput if true, will also track temporary outputs created by this directory
+   */
+  public ByteWritesTrackingDirectoryWrapper(Directory in, boolean trackTempOutput) {
+    super(in);
+    this.trackTempOutput = trackTempOutput;
+  }
+
+  @Override
+  public IndexOutput createOutput(String name, IOContext ioContext) throws IOException {
+    IndexOutput output = in.createOutput(name, ioContext);
+    IndexOutput byteTrackingIndexOutput;
+    if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+      byteTrackingIndexOutput =
+          new ByteTrackingIndexOutput(output, flushedBytes, realTimeFlushedBytes);
+    } else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+      byteTrackingIndexOutput =
+          new ByteTrackingIndexOutput(output, mergedBytes, realTimeMergedBytes);
+    } else {
+      return output;
+    }
+    return byteTrackingIndexOutput;
+  }
+
+  @Override
+  public IndexOutput createTempOutput(String prefix, String suffix, IOContext ioContext)
+      throws IOException {
+    IndexOutput output = in.createTempOutput(prefix, suffix, ioContext);
+    if (trackTempOutput) {
+      IndexOutput byteTrackingIndexOutput;
+      if (ioContext.context.equals(IOContext.Context.FLUSH)) {
+        byteTrackingIndexOutput =
+            new ByteTrackingIndexOutput(output, flushedBytes, realTimeFlushedBytes);
+      } else if (ioContext.context.equals(IOContext.Context.MERGE)) {
+        byteTrackingIndexOutput =
+            new ByteTrackingIndexOutput(output, mergedBytes, realTimeMergedBytes);
+      } else {
+        return output;
+      }
+      return byteTrackingIndexOutput;
+    }
+    return output;
+  }
+
+  public double getApproximateWriteAmplificationFactor() {
+    double flushedBytes = (double) this.flushedBytes.get();
+    if (flushedBytes == 0.0) {
+      return 1.0;
+    }
+    double mergedBytes = (double) this.mergedBytes.get();
+    return (flushedBytes + mergedBytes) / flushedBytes;
+  }
+
+  /** Gets a more up-to-date but less accurate write amplification factor */

Review Comment:
   I wonder why this is less accurate? Seems to me it is more frequently updated and thus should be more accurate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org