You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/22 08:11:59 UTC

[GitHub] [lucene] vigyasharma commented on a diff in pull request #11796: GITHUB#11795: Add FilterDirectory to track write amplification factor

vigyasharma commented on code in PR #11796:
URL: https://github.com/apache/lucene/pull/11796#discussion_r977305054


##########
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;

Review Comment:
   I suppose you're using `AtomicLong` because `IndexOutput` instances are not thread safe. However, I do see multiple `IndexOutput` implementations that track bytes written, in unsynchronized variables.. like `RateLimitedIndexOutput` or `OutputStreamIndexOutput`. 
   
   Perhaps it's okay to do the same here? We could work with approximate values, and avoid the sync. hit here. I guess, that while IndexOutput doesn't provide thread safe guarantees, its consumers try to avoid conflict.



##########
lucene/core/src/test/org/apache/lucene/store/TestWriteAmplificationTrackingDirectoryWrapper.java:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.nio.file.Path;
+import org.apache.lucene.tests.store.BaseDirectoryTestCase;
+
+public class TestWriteAmplificationTrackingDirectoryWrapper extends BaseDirectoryTestCase {
+
+  public void testEmptyDir() throws Exception {
+    WriteAmplificationTrackingDirectoryWrapper dir =
+        new WriteAmplificationTrackingDirectoryWrapper(new ByteBuffersDirectory());
+    assertEquals(1.0, dir.getApproximateWriteAmplificationFactor(), 0.0);
+  }
+
+  public void testRandom() throws Exception {
+    WriteAmplificationTrackingDirectoryWrapper dir =
+        new WriteAmplificationTrackingDirectoryWrapper(new ByteBuffersDirectory());
+
+    int flushBytes = random().nextInt(100);
+    int mergeBytes = random().nextInt(100);
+    double expectedBytes = ((double) flushBytes + (double) mergeBytes) / (double) flushBytes;

Review Comment:
   rename to `expectedWriteAmplification` ?



##########
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;
+
+  protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong byteTracker) {
+    super(
+        "Byte tracking wrapper for: " + output.getName(),
+        "ByteTrackingIndexOutput{" + output.getName() + "}");
+    this.output = output;
+    this.byteTracker = byteTracker;
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+    byteTracker.incrementAndGet();
+    output.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+    byteTracker.addAndGet(length);

Review Comment:
   Let's increment this counter after `output.writeBytes(...)`? So that we don't track failed calls.



##########
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;
+
+  protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong byteTracker) {
+    super(
+        "Byte tracking wrapper for: " + output.getName(),
+        "ByteTrackingIndexOutput{" + output.getName() + "}");
+    this.output = output;
+    this.byteTracker = byteTracker;
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+    byteTracker.incrementAndGet();

Review Comment:
   I don't think `AtomicLong` has an `increment()` only method, does it?



##########
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;
+
+  protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong byteTracker) {
+    super(
+        "Byte tracking wrapper for: " + output.getName(),
+        "ByteTrackingIndexOutput{" + output.getName() + "}");
+    this.output = output;
+    this.byteTracker = byteTracker;
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+    byteTracker.incrementAndGet();
+    output.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+    byteTracker.addAndGet(length);
+    output.writeBytes(b, offset, length);
+  }
+
+  @Override
+  public void close() throws IOException {
+    output.close();
+  }
+
+  @Override
+  public long getFilePointer() {
+    return output.getFilePointer();
+  }
+
+  @Override
+  public long getChecksum() throws IOException {
+    return output.getChecksum();
+  }
+
+  public String getWrappedName() {
+    return output.getName();
+  }
+
+  public String getWrappedToString() {
+    return output.toString();
+  }
+}

Review Comment:
   We should also override `DataOutput` methods like `writeShort()`, `writeLong` etc. The wrapped IndexOutput may be overriding them and not invoking the base class impl. (like in `OutputStreamIndexOutput`)



##########
lucene/core/src/java/org/apache/lucene/store/WriteAmplificationTrackingDirectoryWrapper.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class WriteAmplificationTrackingDirectoryWrapper extends FilterDirectory {

Review Comment:
   This class basically lets us track bytes-written by IOContext (flush/merge). Write amplification is one application for it. Perhaps, we could make this class name a bit more generic.
   
   On similar lines, let's also add getters for the flush and merge byte counters.



##########
lucene/core/src/java/org/apache/lucene/store/WriteAmplificationTrackingDirectoryWrapper.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** {@link FilterDirectory} that tracks write amplification factor */
+public final class WriteAmplificationTrackingDirectoryWrapper extends FilterDirectory {
+
+  private final AtomicLong flushedBytes = new AtomicLong();
+  private final AtomicLong mergedBytes = new AtomicLong();
+
+  /**
+   * Sole constructor, typically called from sub-classes.
+   *
+   * @param in input Directory
+   */
+  public WriteAmplificationTrackingDirectoryWrapper(Directory in) {
+    super(in);
+  }
+
+  @Override
+  public IndexOutput createOutput(String name, IOContext context) throws IOException {
+    IndexOutput output = in.createOutput(name, context);
+    IndexOutput byteTrackingIndexOutput;
+    if (context.context.equals(IOContext.Context.FLUSH)) {
+      byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, flushedBytes);
+    } else if (context.context.equals(IOContext.Context.MERGE)) {
+      byteTrackingIndexOutput = new ByteTrackingIndexOutput(output, mergedBytes);
+    } else {
+      return output;
+    }
+    return byteTrackingIndexOutput;
+  }
+
+  public double getApproximateWriteAmplificationFactor() {

Review Comment:
   I think we want to measure the write amplification of a specific directory instance. In which case, not making it static seems like the right call.



##########
lucene/core/src/java/org/apache/lucene/store/ByteTrackingIndexOutput.java:
##########
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** An {@link IndexOutput} that wraps another instance and tracks the number of bytes written */
+public class ByteTrackingIndexOutput extends IndexOutput {
+
+  private final IndexOutput output;
+  private final AtomicLong byteTracker;
+
+  protected ByteTrackingIndexOutput(IndexOutput output, AtomicLong byteTracker) {
+    super(
+        "Byte tracking wrapper for: " + output.getName(),
+        "ByteTrackingIndexOutput{" + output.getName() + "}");
+    this.output = output;
+    this.byteTracker = byteTracker;
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+    byteTracker.incrementAndGet();
+    output.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+    byteTracker.addAndGet(length);
+    output.writeBytes(b, offset, length);
+  }
+
+  @Override
+  public void close() throws IOException {
+    output.close();
+  }
+
+  @Override
+  public long getFilePointer() {
+    return output.getFilePointer();
+  }
+
+  @Override
+  public long getChecksum() throws IOException {
+    return output.getChecksum();
+  }
+
+  public String getWrappedName() {

Review Comment:
   Parent class `getName()` and `toString()` get overridden to the ByteTrackingIndexOutput values when we call super() in the constructor.. I think these methods were added to provide a way to get details about the IndexOutput that is being wrapped. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org