You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Fokko (via GitHub)" <gi...@apache.org> on 2023/09/25 13:57:45 UTC

[GitHub] [avro] Fokko opened a new pull request, #2521: Java: Add blocking direct binary encoder

Fokko opened a new pull request, #2521:
URL: https://github.com/apache/avro/pull/2521

   <!--
   
   *Thank you very much for contributing to Apache Avro - we are happy that you want to help us improve Avro. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
   
   *Please understand that we do not do this to make contributions to Avro a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/AVRO/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.
     
     - Name the pull request in the form "AVRO-XXXX: [component] Title of the pull request", where *AVRO-XXXX* should be replaced by the actual issue number. 
       The *component* is optional, but can help identify the correct reviewers faster: either the language ("java", "python") or subsystem such as "build" or "doc" are good candidates.  
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Make sure that the change passes the automated tests. You can [build the entire project](https://github.com/apache/avro/blob/master/BUILD.md) or just the [language-specific SDK](https://avro.apache.org/project/how-to-contribute/#unit-tests).
   
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message (including the JIRA id)
   
     - Every commit message references Jira issues in their subject lines. In addition, commits follow the guidelines from [How to write a good git commit message](https://chris.beams.io/posts/git-commit/)
       1. Subject is separated from body by a blank line
       1. Subject is limited to 50 characters (not including Jira issue reference)
       1. Subject does not end with a period
       1. Subject uses the imperative mood ("add", not "adding")
       1. Body wraps at 72 characters
       1. Body explains "what" and "why", not "how"
   
   -->
   
   ## What is the purpose of the change
   
   *(For example: This pull request improves file read performance by buffering data, fixing AVRO-XXXX.)*
   
   
   ## Verifying this change
   
   *(Please pick one of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   - *Extended interop tests to verify consistent valid schema names between SDKs*
   - *Added test that validates that Java throws an AvroRuntimeException on invalid binary data*
   - *Manually verified the change by building the website and checking the new redirect*
   
   
   ## Documentation
   
   - Does this pull request introduce a new feature? (yes / no)
   - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko merged PR #2521:
URL: https://github.com/apache/avro/pull/2521


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362021113


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   Hmm, it seems that the byte count is ignored in the Java implementation: https://github.com/apache/avro/blob/main/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java#L389-L397



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362123041


##########
lang/java/avro/src/test/java/org/apache/avro/specific/TestRecordWithMapsAndArrays.java:
##########
@@ -0,0 +1,523 @@
+/**
+ * Autogenerated by Avro
+ *
+ * DO NOT EDIT DIRECTLY
+ */
+package org.apache.avro.specific;
+
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.specific.SpecificData;
+import org.apache.avro.util.Utf8;
+import org.apache.avro.message.BinaryMessageEncoder;
+import org.apache.avro.message.BinaryMessageDecoder;
+import org.apache.avro.message.SchemaStore;
+
+@org.apache.avro.specific.AvroGenerated
+public class TestRecordWithMapsAndArrays extends org.apache.avro.specific.SpecificRecordBase
+    implements org.apache.avro.specific.SpecificRecord {
+  private static final long serialVersionUID = 3113266652594662627L;
+
+  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse(
+      "{\"type\":\"record\",\"name\":\"TestRecordWithMapsAndArrays\",\"namespace\":\"org.apache.avro.specific\",\"fields\":[{\"name\":\"arr\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"default\":[]}},{\"name\":\"map\",\"type\":{\"type\":\"map\",\"values\":\"long\",\"avro.java.string\":\"String\",\"default\":{}}}]}");
+
+  public static org.apache.avro.Schema getClassSchema() {
+    return SCHEMA$;
+  }
+
+  private static final SpecificData MODEL$ = new SpecificData();
+
+  private static final BinaryMessageEncoder<TestRecordWithMapsAndArrays> ENCODER = new BinaryMessageEncoder<>(MODEL$,
+      SCHEMA$);
+
+  private static final BinaryMessageDecoder<TestRecordWithMapsAndArrays> DECODER = new BinaryMessageDecoder<>(MODEL$,
+      SCHEMA$);
+
+  /**
+   * Return the BinaryMessageEncoder instance used by this class.
+   *
+   * @return the message encoder used by this class
+   */
+  public static BinaryMessageEncoder<TestRecordWithMapsAndArrays> getEncoder() {
+    return ENCODER;
+  }
+
+  /**
+   * Return the BinaryMessageDecoder instance used by this class.
+   *
+   * @return the message decoder used by this class
+   */
+  public static BinaryMessageDecoder<TestRecordWithMapsAndArrays> getDecoder() {
+    return DECODER;
+  }
+
+  /**
+   * Create a new BinaryMessageDecoder instance for this class that uses the
+   * specified {@link SchemaStore}.
+   *
+   * @param resolver a {@link SchemaStore} used to find schemas by fingerprint
+   * @return a BinaryMessageDecoder instance for this class backed by the given
+   *         SchemaStore
+   */
+  public static BinaryMessageDecoder<TestRecordWithMapsAndArrays> createDecoder(SchemaStore resolver) {
+    return new BinaryMessageDecoder<>(MODEL$, SCHEMA$, resolver);
+  }
+
+  /**
+   * Serializes this TestRecordWithMapsAndArrays to a ByteBuffer.
+   *
+   * @return a buffer holding the serialized data for this instance
+   * @throws java.io.IOException if this instance could not be serialized
+   */
+  public java.nio.ByteBuffer toByteBuffer() throws java.io.IOException {
+    return ENCODER.encode(this);
+  }
+
+  /**
+   * Deserializes a TestRecordWithMapsAndArrays from a ByteBuffer.
+   *
+   * @param b a byte buffer holding serialized data for an instance of this class
+   * @return a TestRecordWithMapsAndArrays instance decoded from the given buffer
+   * @throws java.io.IOException if the given bytes could not be deserialized into
+   *                             an instance of this class
+   */
+  public static TestRecordWithMapsAndArrays fromByteBuffer(java.nio.ByteBuffer b) throws java.io.IOException {
+    return DECODER.decode(b);
+  }
+
+  private java.util.List<java.lang.String> arr;
+  private java.util.Map<java.lang.String, java.lang.Long> map;
+
+  /**
+   * Default constructor. Note that this does not initialize fields to their
+   * default values from the schema. If that is desired then one should use
+   * <code>newBuilder()</code>.
+   */
+  public TestRecordWithMapsAndArrays() {
+  }
+
+  /**
+   * All-args constructor.
+   *
+   * @param arr The new value for arr
+   * @param map The new value for map
+   */
+  public TestRecordWithMapsAndArrays(java.util.List<java.lang.String> arr,
+      java.util.Map<java.lang.String, java.lang.Long> map) {
+    this.arr = arr;
+    this.map = map;
+  }
+
+  @Override
+  public org.apache.avro.specific.SpecificData getSpecificData() {
+    return MODEL$;
+  }
+
+  @Override
+  public org.apache.avro.Schema getSchema() {
+    return SCHEMA$;
+  }
+
+  // Used by DatumWriter. Applications should not call.
+  @Override
+  public java.lang.Object get(int field$) {
+    switch (field$) {
+    case 0:
+      return arr;
+    case 1:
+      return map;
+    default:
+      throw new IndexOutOfBoundsException("Invalid index: " + field$);
+    }
+  }
+
+  // Used by DatumReader. Applications should not call.
+  @Override
+  @SuppressWarnings(value = "unchecked")
+  public void put(int field$, java.lang.Object value$) {
+    switch (field$) {
+    case 0:
+      arr = (java.util.List<java.lang.String>) value$;
+      break;
+    case 1:
+      map = (java.util.Map<java.lang.String, java.lang.Long>) value$;
+      break;
+    default:
+      throw new IndexOutOfBoundsException("Invalid index: " + field$);
+    }
+  }
+
+  /**
+   * Gets the value of the 'arr' field.
+   *
+   * @return The value of the 'arr' field.
+   */
+  public java.util.List<java.lang.String> getArr() {
+    return arr;
+  }
+
+  /**
+   * Sets the value of the 'arr' field.
+   *
+   * @param value the value to set.
+   */
+  public void setArr(java.util.List<java.lang.String> value) {
+    this.arr = value;
+  }
+
+  /**
+   * Gets the value of the 'map' field.
+   *
+   * @return The value of the 'map' field.
+   */
+  public java.util.Map<java.lang.String, java.lang.Long> getMap() {
+    return map;
+  }
+
+  /**
+   * Sets the value of the 'map' field.
+   *
+   * @param value the value to set.
+   */
+  public void setMap(java.util.Map<java.lang.String, java.lang.Long> value) {
+    this.map = value;
+  }
+
+  /**
+   * Creates a new TestRecordWithMapsAndArrays RecordBuilder.
+   *
+   * @return A new TestRecordWithMapsAndArrays RecordBuilder
+   */
+  public static org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder newBuilder() {
+    return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder();
+  }
+
+  /**
+   * Creates a new TestRecordWithMapsAndArrays RecordBuilder by copying an
+   * existing Builder.
+   *
+   * @param other The existing builder to copy.
+   * @return A new TestRecordWithMapsAndArrays RecordBuilder
+   */
+  public static org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder newBuilder(
+      org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder other) {
+    if (other == null) {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder();
+    } else {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder(other);
+    }
+  }
+
+  /**
+   * Creates a new TestRecordWithMapsAndArrays RecordBuilder by copying an
+   * existing TestRecordWithMapsAndArrays instance.
+   *
+   * @param other The existing instance to copy.
+   * @return A new TestRecordWithMapsAndArrays RecordBuilder
+   */
+  public static org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder newBuilder(
+      org.apache.avro.specific.TestRecordWithMapsAndArrays other) {
+    if (other == null) {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder();
+    } else {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder(other);
+    }
+  }
+
+  /**
+   * RecordBuilder for TestRecordWithMapsAndArrays instances.
+   */
+  @org.apache.avro.specific.AvroGenerated
+  public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<TestRecordWithMapsAndArrays>
+      implements org.apache.avro.data.RecordBuilder<TestRecordWithMapsAndArrays> {
+
+    private java.util.List<java.lang.String> arr;
+    private java.util.Map<java.lang.String, java.lang.Long> map;
+
+    /** Creates a new Builder */
+    private Builder() {
+      super(SCHEMA$, MODEL$);
+    }
+
+    /**
+     * Creates a Builder by copying an existing Builder.
+     *
+     * @param other The existing Builder to copy.
+     */
+    private Builder(org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder other) {
+      super(other);
+      if (isValidValue(fields()[0], other.arr)) {
+        this.arr = data().deepCopy(fields()[0].schema(), other.arr);
+        fieldSetFlags()[0] = other.fieldSetFlags()[0];
+      }
+      if (isValidValue(fields()[1], other.map)) {
+        this.map = data().deepCopy(fields()[1].schema(), other.map);
+        fieldSetFlags()[1] = other.fieldSetFlags()[1];
+      }
+    }
+
+    /**
+     * Creates a Builder by copying an existing TestRecordWithMapsAndArrays instance
+     *
+     * @param other The existing instance to copy.
+     */
+    private Builder(org.apache.avro.specific.TestRecordWithMapsAndArrays other) {
+      super(SCHEMA$, MODEL$);
+      if (isValidValue(fields()[0], other.arr)) {
+        this.arr = data().deepCopy(fields()[0].schema(), other.arr);
+        fieldSetFlags()[0] = true;
+      }
+      if (isValidValue(fields()[1], other.map)) {
+        this.map = data().deepCopy(fields()[1].schema(), other.map);
+        fieldSetFlags()[1] = true;
+      }
+    }
+
+    /**
+     * Gets the value of the 'arr' field.
+     *
+     * @return The value.
+     */
+    public java.util.List<java.lang.String> getArr() {
+      return arr;
+    }
+
+    /**
+     * Sets the value of the 'arr' field.
+     *
+     * @param value The value of 'arr'.
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder setArr(java.util.List<java.lang.String> value) {
+      validate(fields()[0], value);
+      this.arr = value;
+      fieldSetFlags()[0] = true;
+      return this;
+    }
+
+    /**
+     * Checks whether the 'arr' field has been set.
+     *
+     * @return True if the 'arr' field has been set, false otherwise.
+     */
+    public boolean hasArr() {
+      return fieldSetFlags()[0];
+    }
+
+    /**
+     * Clears the value of the 'arr' field.
+     *
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder clearArr() {
+      arr = null;
+      fieldSetFlags()[0] = false;
+      return this;
+    }
+
+    /**
+     * Gets the value of the 'map' field.
+     *
+     * @return The value.
+     */
+    public java.util.Map<java.lang.String, java.lang.Long> getMap() {
+      return map;
+    }
+
+    /**
+     * Sets the value of the 'map' field.
+     *
+     * @param value The value of 'map'.
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder setMap(
+        java.util.Map<java.lang.String, java.lang.Long> value) {
+      validate(fields()[1], value);
+      this.map = value;
+      fieldSetFlags()[1] = true;
+      return this;
+    }
+
+    /**
+     * Checks whether the 'map' field has been set.
+     *
+     * @return True if the 'map' field has been set, false otherwise.
+     */
+    public boolean hasMap() {
+      return fieldSetFlags()[1];
+    }
+
+    /**
+     * Clears the value of the 'map' field.
+     *
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder clearMap() {
+      map = null;
+      fieldSetFlags()[1] = false;
+      return this;
+    }
+
+    @Override
+    @SuppressWarnings("unchecked")
+    public TestRecordWithMapsAndArrays build() {
+      try {
+        TestRecordWithMapsAndArrays record = new TestRecordWithMapsAndArrays();
+        record.arr = fieldSetFlags()[0] ? this.arr : (java.util.List<java.lang.String>) defaultValue(fields()[0]);
+        record.map = fieldSetFlags()[1] ? this.map
+            : (java.util.Map<java.lang.String, java.lang.Long>) defaultValue(fields()[1]);
+        return record;
+      } catch (org.apache.avro.AvroMissingFieldException e) {
+        throw e;
+      } catch (java.lang.Exception e) {
+        throw new org.apache.avro.AvroRuntimeException(e);
+      }
+    }
+  }
+
+  @SuppressWarnings("unchecked")
+  private static final org.apache.avro.io.DatumWriter<TestRecordWithMapsAndArrays> WRITER$ = (org.apache.avro.io.DatumWriter<TestRecordWithMapsAndArrays>) MODEL$
+      .createDatumWriter(SCHEMA$);
+
+  @Override
+  public void writeExternal(java.io.ObjectOutput out) throws java.io.IOException {
+    WRITER$.write(this, SpecificData.getEncoder(out));
+  }
+
+  @SuppressWarnings("unchecked")
+  private static final org.apache.avro.io.DatumReader<TestRecordWithMapsAndArrays> READER$ = (org.apache.avro.io.DatumReader<TestRecordWithMapsAndArrays>) MODEL$
+      .createDatumReader(SCHEMA$);
+
+  @Override
+  public void readExternal(java.io.ObjectInput in) throws java.io.IOException {
+    READER$.read(this, SpecificData.getDecoder(in));
+  }
+
+  @Override
+  protected boolean hasCustomCoders() {
+    return true;
+  }
+
+  @Override
+  public void customEncode(org.apache.avro.io.Encoder out) throws java.io.IOException {
+    long size0 = this.arr.size();
+    out.writeArrayStart();
+    out.setItemCount(size0);
+    long actualSize0 = 0;
+    for (java.lang.String e0 : this.arr) {
+      actualSize0++;
+      out.startItem();
+      out.writeString(e0);
+    }
+    out.writeArrayEnd();
+    if (actualSize0 != size0)
+      throw new java.util.ConcurrentModificationException(
+          "Array-size written was " + size0 + ", but element count was " + actualSize0 + ".");
+
+    long size1 = this.map.size();
+    out.writeMapStart();
+    out.setItemCount(size1);
+    long actualSize1 = 0;
+    for (java.util.Map.Entry<java.lang.String, java.lang.Long> e1 : this.map.entrySet()) {
+      actualSize1++;
+      out.startItem();
+      out.writeString(e1.getKey());
+      java.lang.Long v1 = e1.getValue();
+      out.writeLong(v1);
+    }
+    out.writeMapEnd();
+    if (actualSize1 != size1)
+      throw new java.util.ConcurrentModificationException(
+          "Map-size written was " + size1 + ", but element count was " + actualSize1 + ".");
+
+  }
+
+  @Override
+  public void customDecode(org.apache.avro.io.ResolvingDecoder in) throws java.io.IOException {
+    org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrderIfDiff();
+    if (fieldOrder == null) {
+      long size0 = in.readArrayStart();
+      java.util.List<java.lang.String> a0 = this.arr;
+      if (a0 == null) {
+        a0 = new SpecificData.Array<java.lang.String>((int) size0, SCHEMA$.getField("arr").schema());
+        this.arr = a0;
+      } else
+        a0.clear();
+      SpecificData.Array<java.lang.String> ga0 = (a0 instanceof SpecificData.Array
+          ? (SpecificData.Array<java.lang.String>) a0
+          : null);
+      for (; 0 < size0; size0 = in.arrayNext()) {
+        for (; size0 != 0; size0--) {
+          java.lang.String e0 = (ga0 != null ? ga0.peek() : null);
+          e0 = in.readString();
+          a0.add(e0);
+        }
+      }
+
+      long size1 = in.readMapStart();
+      java.util.Map<java.lang.String, java.lang.Long> m1 = this.map; // Need fresh name due to limitation of macro
+                                                                     // system
+      if (m1 == null) {
+        m1 = new java.util.HashMap<java.lang.String, java.lang.Long>((int) size1);
+        this.map = m1;
+      } else
+        m1.clear();
+      for (; 0 < size1; size1 = in.mapNext()) {
+        for (; size1 != 0; size1--) {
+          java.lang.String k1 = null;
+          k1 = in.readString();
+          java.lang.Long v1 = null;
+          v1 = in.readLong();
+          m1.put(k1, v1);
+        }
+      }
+
+    } else {
+      for (int i = 0; i < 2; i++) {
+        switch (fieldOrder[i].pos()) {
+        case 0:
+          long size0 = in.readArrayStart();
+          java.util.List<java.lang.String> a0 = this.arr;
+          if (a0 == null) {
+            a0 = new SpecificData.Array<java.lang.String>((int) size0, SCHEMA$.getField("arr").schema());
+            this.arr = a0;
+          } else
+            a0.clear();
+          SpecificData.Array<java.lang.String> ga0 = (a0 instanceof SpecificData.Array
+              ? (SpecificData.Array<java.lang.String>) a0

Review Comment:
   ## Cast from abstract to concrete collection
   
   [List<String>](1) is cast to the concrete type [Array<String>](2), losing abstraction.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3136)



##########
lang/java/avro/src/test/java/org/apache/avro/specific/TestRecordWithMapsAndArrays.java:
##########
@@ -0,0 +1,523 @@
+/**
+ * Autogenerated by Avro
+ *
+ * DO NOT EDIT DIRECTLY
+ */
+package org.apache.avro.specific;
+
+import org.apache.avro.generic.GenericArray;
+import org.apache.avro.specific.SpecificData;
+import org.apache.avro.util.Utf8;
+import org.apache.avro.message.BinaryMessageEncoder;
+import org.apache.avro.message.BinaryMessageDecoder;
+import org.apache.avro.message.SchemaStore;
+
+@org.apache.avro.specific.AvroGenerated
+public class TestRecordWithMapsAndArrays extends org.apache.avro.specific.SpecificRecordBase
+    implements org.apache.avro.specific.SpecificRecord {
+  private static final long serialVersionUID = 3113266652594662627L;
+
+  public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse(
+      "{\"type\":\"record\",\"name\":\"TestRecordWithMapsAndArrays\",\"namespace\":\"org.apache.avro.specific\",\"fields\":[{\"name\":\"arr\",\"type\":{\"type\":\"array\",\"items\":{\"type\":\"string\",\"avro.java.string\":\"String\"},\"default\":[]}},{\"name\":\"map\",\"type\":{\"type\":\"map\",\"values\":\"long\",\"avro.java.string\":\"String\",\"default\":{}}}]}");
+
+  public static org.apache.avro.Schema getClassSchema() {
+    return SCHEMA$;
+  }
+
+  private static final SpecificData MODEL$ = new SpecificData();
+
+  private static final BinaryMessageEncoder<TestRecordWithMapsAndArrays> ENCODER = new BinaryMessageEncoder<>(MODEL$,
+      SCHEMA$);
+
+  private static final BinaryMessageDecoder<TestRecordWithMapsAndArrays> DECODER = new BinaryMessageDecoder<>(MODEL$,
+      SCHEMA$);
+
+  /**
+   * Return the BinaryMessageEncoder instance used by this class.
+   *
+   * @return the message encoder used by this class
+   */
+  public static BinaryMessageEncoder<TestRecordWithMapsAndArrays> getEncoder() {
+    return ENCODER;
+  }
+
+  /**
+   * Return the BinaryMessageDecoder instance used by this class.
+   *
+   * @return the message decoder used by this class
+   */
+  public static BinaryMessageDecoder<TestRecordWithMapsAndArrays> getDecoder() {
+    return DECODER;
+  }
+
+  /**
+   * Create a new BinaryMessageDecoder instance for this class that uses the
+   * specified {@link SchemaStore}.
+   *
+   * @param resolver a {@link SchemaStore} used to find schemas by fingerprint
+   * @return a BinaryMessageDecoder instance for this class backed by the given
+   *         SchemaStore
+   */
+  public static BinaryMessageDecoder<TestRecordWithMapsAndArrays> createDecoder(SchemaStore resolver) {
+    return new BinaryMessageDecoder<>(MODEL$, SCHEMA$, resolver);
+  }
+
+  /**
+   * Serializes this TestRecordWithMapsAndArrays to a ByteBuffer.
+   *
+   * @return a buffer holding the serialized data for this instance
+   * @throws java.io.IOException if this instance could not be serialized
+   */
+  public java.nio.ByteBuffer toByteBuffer() throws java.io.IOException {
+    return ENCODER.encode(this);
+  }
+
+  /**
+   * Deserializes a TestRecordWithMapsAndArrays from a ByteBuffer.
+   *
+   * @param b a byte buffer holding serialized data for an instance of this class
+   * @return a TestRecordWithMapsAndArrays instance decoded from the given buffer
+   * @throws java.io.IOException if the given bytes could not be deserialized into
+   *                             an instance of this class
+   */
+  public static TestRecordWithMapsAndArrays fromByteBuffer(java.nio.ByteBuffer b) throws java.io.IOException {
+    return DECODER.decode(b);
+  }
+
+  private java.util.List<java.lang.String> arr;
+  private java.util.Map<java.lang.String, java.lang.Long> map;
+
+  /**
+   * Default constructor. Note that this does not initialize fields to their
+   * default values from the schema. If that is desired then one should use
+   * <code>newBuilder()</code>.
+   */
+  public TestRecordWithMapsAndArrays() {
+  }
+
+  /**
+   * All-args constructor.
+   *
+   * @param arr The new value for arr
+   * @param map The new value for map
+   */
+  public TestRecordWithMapsAndArrays(java.util.List<java.lang.String> arr,
+      java.util.Map<java.lang.String, java.lang.Long> map) {
+    this.arr = arr;
+    this.map = map;
+  }
+
+  @Override
+  public org.apache.avro.specific.SpecificData getSpecificData() {
+    return MODEL$;
+  }
+
+  @Override
+  public org.apache.avro.Schema getSchema() {
+    return SCHEMA$;
+  }
+
+  // Used by DatumWriter. Applications should not call.
+  @Override
+  public java.lang.Object get(int field$) {
+    switch (field$) {
+    case 0:
+      return arr;
+    case 1:
+      return map;
+    default:
+      throw new IndexOutOfBoundsException("Invalid index: " + field$);
+    }
+  }
+
+  // Used by DatumReader. Applications should not call.
+  @Override
+  @SuppressWarnings(value = "unchecked")
+  public void put(int field$, java.lang.Object value$) {
+    switch (field$) {
+    case 0:
+      arr = (java.util.List<java.lang.String>) value$;
+      break;
+    case 1:
+      map = (java.util.Map<java.lang.String, java.lang.Long>) value$;
+      break;
+    default:
+      throw new IndexOutOfBoundsException("Invalid index: " + field$);
+    }
+  }
+
+  /**
+   * Gets the value of the 'arr' field.
+   *
+   * @return The value of the 'arr' field.
+   */
+  public java.util.List<java.lang.String> getArr() {
+    return arr;
+  }
+
+  /**
+   * Sets the value of the 'arr' field.
+   *
+   * @param value the value to set.
+   */
+  public void setArr(java.util.List<java.lang.String> value) {
+    this.arr = value;
+  }
+
+  /**
+   * Gets the value of the 'map' field.
+   *
+   * @return The value of the 'map' field.
+   */
+  public java.util.Map<java.lang.String, java.lang.Long> getMap() {
+    return map;
+  }
+
+  /**
+   * Sets the value of the 'map' field.
+   *
+   * @param value the value to set.
+   */
+  public void setMap(java.util.Map<java.lang.String, java.lang.Long> value) {
+    this.map = value;
+  }
+
+  /**
+   * Creates a new TestRecordWithMapsAndArrays RecordBuilder.
+   *
+   * @return A new TestRecordWithMapsAndArrays RecordBuilder
+   */
+  public static org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder newBuilder() {
+    return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder();
+  }
+
+  /**
+   * Creates a new TestRecordWithMapsAndArrays RecordBuilder by copying an
+   * existing Builder.
+   *
+   * @param other The existing builder to copy.
+   * @return A new TestRecordWithMapsAndArrays RecordBuilder
+   */
+  public static org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder newBuilder(
+      org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder other) {
+    if (other == null) {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder();
+    } else {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder(other);
+    }
+  }
+
+  /**
+   * Creates a new TestRecordWithMapsAndArrays RecordBuilder by copying an
+   * existing TestRecordWithMapsAndArrays instance.
+   *
+   * @param other The existing instance to copy.
+   * @return A new TestRecordWithMapsAndArrays RecordBuilder
+   */
+  public static org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder newBuilder(
+      org.apache.avro.specific.TestRecordWithMapsAndArrays other) {
+    if (other == null) {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder();
+    } else {
+      return new org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder(other);
+    }
+  }
+
+  /**
+   * RecordBuilder for TestRecordWithMapsAndArrays instances.
+   */
+  @org.apache.avro.specific.AvroGenerated
+  public static class Builder extends org.apache.avro.specific.SpecificRecordBuilderBase<TestRecordWithMapsAndArrays>
+      implements org.apache.avro.data.RecordBuilder<TestRecordWithMapsAndArrays> {
+
+    private java.util.List<java.lang.String> arr;
+    private java.util.Map<java.lang.String, java.lang.Long> map;
+
+    /** Creates a new Builder */
+    private Builder() {
+      super(SCHEMA$, MODEL$);
+    }
+
+    /**
+     * Creates a Builder by copying an existing Builder.
+     *
+     * @param other The existing Builder to copy.
+     */
+    private Builder(org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder other) {
+      super(other);
+      if (isValidValue(fields()[0], other.arr)) {
+        this.arr = data().deepCopy(fields()[0].schema(), other.arr);
+        fieldSetFlags()[0] = other.fieldSetFlags()[0];
+      }
+      if (isValidValue(fields()[1], other.map)) {
+        this.map = data().deepCopy(fields()[1].schema(), other.map);
+        fieldSetFlags()[1] = other.fieldSetFlags()[1];
+      }
+    }
+
+    /**
+     * Creates a Builder by copying an existing TestRecordWithMapsAndArrays instance
+     *
+     * @param other The existing instance to copy.
+     */
+    private Builder(org.apache.avro.specific.TestRecordWithMapsAndArrays other) {
+      super(SCHEMA$, MODEL$);
+      if (isValidValue(fields()[0], other.arr)) {
+        this.arr = data().deepCopy(fields()[0].schema(), other.arr);
+        fieldSetFlags()[0] = true;
+      }
+      if (isValidValue(fields()[1], other.map)) {
+        this.map = data().deepCopy(fields()[1].schema(), other.map);
+        fieldSetFlags()[1] = true;
+      }
+    }
+
+    /**
+     * Gets the value of the 'arr' field.
+     *
+     * @return The value.
+     */
+    public java.util.List<java.lang.String> getArr() {
+      return arr;
+    }
+
+    /**
+     * Sets the value of the 'arr' field.
+     *
+     * @param value The value of 'arr'.
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder setArr(java.util.List<java.lang.String> value) {
+      validate(fields()[0], value);
+      this.arr = value;
+      fieldSetFlags()[0] = true;
+      return this;
+    }
+
+    /**
+     * Checks whether the 'arr' field has been set.
+     *
+     * @return True if the 'arr' field has been set, false otherwise.
+     */
+    public boolean hasArr() {
+      return fieldSetFlags()[0];
+    }
+
+    /**
+     * Clears the value of the 'arr' field.
+     *
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder clearArr() {
+      arr = null;
+      fieldSetFlags()[0] = false;
+      return this;
+    }
+
+    /**
+     * Gets the value of the 'map' field.
+     *
+     * @return The value.
+     */
+    public java.util.Map<java.lang.String, java.lang.Long> getMap() {
+      return map;
+    }
+
+    /**
+     * Sets the value of the 'map' field.
+     *
+     * @param value The value of 'map'.
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder setMap(
+        java.util.Map<java.lang.String, java.lang.Long> value) {
+      validate(fields()[1], value);
+      this.map = value;
+      fieldSetFlags()[1] = true;
+      return this;
+    }
+
+    /**
+     * Checks whether the 'map' field has been set.
+     *
+     * @return True if the 'map' field has been set, false otherwise.
+     */
+    public boolean hasMap() {
+      return fieldSetFlags()[1];
+    }
+
+    /**
+     * Clears the value of the 'map' field.
+     *
+     * @return This builder.
+     */
+    public org.apache.avro.specific.TestRecordWithMapsAndArrays.Builder clearMap() {
+      map = null;
+      fieldSetFlags()[1] = false;
+      return this;
+    }
+
+    @Override
+    @SuppressWarnings("unchecked")
+    public TestRecordWithMapsAndArrays build() {
+      try {
+        TestRecordWithMapsAndArrays record = new TestRecordWithMapsAndArrays();
+        record.arr = fieldSetFlags()[0] ? this.arr : (java.util.List<java.lang.String>) defaultValue(fields()[0]);
+        record.map = fieldSetFlags()[1] ? this.map
+            : (java.util.Map<java.lang.String, java.lang.Long>) defaultValue(fields()[1]);
+        return record;
+      } catch (org.apache.avro.AvroMissingFieldException e) {
+        throw e;
+      } catch (java.lang.Exception e) {
+        throw new org.apache.avro.AvroRuntimeException(e);
+      }
+    }
+  }
+
+  @SuppressWarnings("unchecked")
+  private static final org.apache.avro.io.DatumWriter<TestRecordWithMapsAndArrays> WRITER$ = (org.apache.avro.io.DatumWriter<TestRecordWithMapsAndArrays>) MODEL$
+      .createDatumWriter(SCHEMA$);
+
+  @Override
+  public void writeExternal(java.io.ObjectOutput out) throws java.io.IOException {
+    WRITER$.write(this, SpecificData.getEncoder(out));
+  }
+
+  @SuppressWarnings("unchecked")
+  private static final org.apache.avro.io.DatumReader<TestRecordWithMapsAndArrays> READER$ = (org.apache.avro.io.DatumReader<TestRecordWithMapsAndArrays>) MODEL$
+      .createDatumReader(SCHEMA$);
+
+  @Override
+  public void readExternal(java.io.ObjectInput in) throws java.io.IOException {
+    READER$.read(this, SpecificData.getDecoder(in));
+  }
+
+  @Override
+  protected boolean hasCustomCoders() {
+    return true;
+  }
+
+  @Override
+  public void customEncode(org.apache.avro.io.Encoder out) throws java.io.IOException {
+    long size0 = this.arr.size();
+    out.writeArrayStart();
+    out.setItemCount(size0);
+    long actualSize0 = 0;
+    for (java.lang.String e0 : this.arr) {
+      actualSize0++;
+      out.startItem();
+      out.writeString(e0);
+    }
+    out.writeArrayEnd();
+    if (actualSize0 != size0)
+      throw new java.util.ConcurrentModificationException(
+          "Array-size written was " + size0 + ", but element count was " + actualSize0 + ".");
+
+    long size1 = this.map.size();
+    out.writeMapStart();
+    out.setItemCount(size1);
+    long actualSize1 = 0;
+    for (java.util.Map.Entry<java.lang.String, java.lang.Long> e1 : this.map.entrySet()) {
+      actualSize1++;
+      out.startItem();
+      out.writeString(e1.getKey());
+      java.lang.Long v1 = e1.getValue();
+      out.writeLong(v1);
+    }
+    out.writeMapEnd();
+    if (actualSize1 != size1)
+      throw new java.util.ConcurrentModificationException(
+          "Map-size written was " + size1 + ", but element count was " + actualSize1 + ".");
+
+  }
+
+  @Override
+  public void customDecode(org.apache.avro.io.ResolvingDecoder in) throws java.io.IOException {
+    org.apache.avro.Schema.Field[] fieldOrder = in.readFieldOrderIfDiff();
+    if (fieldOrder == null) {
+      long size0 = in.readArrayStart();
+      java.util.List<java.lang.String> a0 = this.arr;
+      if (a0 == null) {
+        a0 = new SpecificData.Array<java.lang.String>((int) size0, SCHEMA$.getField("arr").schema());
+        this.arr = a0;
+      } else
+        a0.clear();
+      SpecificData.Array<java.lang.String> ga0 = (a0 instanceof SpecificData.Array
+          ? (SpecificData.Array<java.lang.String>) a0

Review Comment:
   ## Cast from abstract to concrete collection
   
   [List<String>](1) is cast to the concrete type [Array<String>](2), losing abstraction.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3135)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1361872190


##########
lang/java/avro/src/main/java/org/apache/avro/io/BlockingDirectBinaryEncoder.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.io;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * An {@link Encoder} for Avro's binary encoding that does not buffer output.
+ * <p/>
+ * This encoder does not buffer writes, and as a result is slower than
+ * {@link BufferedBinaryEncoder}. However, it is lighter-weight and useful when
+ * the buffering in BufferedBinaryEncoder is not desired and/or the Encoder is
+ * very short-lived.
+ * <p/>
+ * To construct, use
+ * {@link EncoderFactory#blockingDirectBinaryEncoder(OutputStream, BinaryEncoder)}
+ * <p/>
+ * BlockingDirectBinaryEncoder is not thread-safe
+ *
+ * @see BinaryEncoder
+ * @see EncoderFactory
+ * @see Encoder
+ * @see Decoder
+ */
+public class BlockingDirectBinaryEncoder extends DirectBinaryEncoder {
+  private static final ThreadLocal<BufferOutputStream> BUFFER = ThreadLocal.withInitial(BufferOutputStream::new);
+
+  private OutputStream originalStream;
+
+  private boolean inBlock = false;
+
+  private long blockItemCount;
+
+  /**
+   * Create a writer that sends its output to the underlying stream
+   * <code>out</code>.
+   *
+   * @param out The Outputstream to write to
+   */
+  public BlockingDirectBinaryEncoder(OutputStream out) {
+    super(out);
+  }
+
+  private void startBlock() {
+    if (inBlock) {
+      throw new RuntimeException("Nested Maps/Arrays are not supported by the BlockingDirectBinaryEncoder");

Review Comment:
   Yes, I was thinking of that as well, but this would potentially allocate quite some buffers. For Apache Iceberg we don't use any nested structures, so I went with the simplest approach, but if you think this should be in, I'm happy to add it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362009347


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   It is used at read time. It can be used for skipping over the Array/Maps when they are not part of your read schema. Instead of reading all the fields, you can just skip over the whole Array/Map at once since the length is encoded in the Avro file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "clesaec (via GitHub)" <gi...@apache.org>.
clesaec commented on PR #2521:
URL: https://github.com/apache/avro/pull/2521#issuecomment-1767800516

   @opwvhk : Could you have a look at this PR please ?; to have a second eyes on it before i merge it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1361868247


##########
lang/java/avro/src/main/java/org/apache/avro/io/BlockingDirectBinaryEncoder.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.io;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * An {@link Encoder} for Avro's binary encoding that does not buffer output.
+ * <p/>
+ * This encoder does not buffer writes, and as a result is slower than
+ * {@link BufferedBinaryEncoder}. However, it is lighter-weight and useful when
+ * the buffering in BufferedBinaryEncoder is not desired and/or the Encoder is
+ * very short-lived.
+ * <p/>
+ * To construct, use
+ * {@link EncoderFactory#blockingDirectBinaryEncoder(OutputStream, BinaryEncoder)}
+ * <p/>
+ * BlockingDirectBinaryEncoder is not thread-safe
+ *
+ * @see BinaryEncoder
+ * @see EncoderFactory
+ * @see Encoder
+ * @see Decoder
+ */
+public class BlockingDirectBinaryEncoder extends DirectBinaryEncoder {
+  private static final ThreadLocal<BufferOutputStream> BUFFER = ThreadLocal.withInitial(BufferOutputStream::new);

Review Comment:
   Oh that's an excellent question. The idea was to re-use the buffer since it can grow quite a bit (in the case of the Apache Iceberg metadata), but a local variable is indeed a better plan to avoid race conditions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362002460


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();

Review Comment:
   I've added this test-case in `TestBlockingDirectBinaryEncoder`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.
opwvhk commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1367694057


##########
lang/java/avro/src/main/java/org/apache/avro/io/BlockingDirectBinaryEncoder.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.io;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * An {@link Encoder} for Avro's binary encoding that does not buffer output.
+ * <p/>
+ * This encoder does not buffer writes, and as a result is slower than
+ * {@link BufferedBinaryEncoder}. However, it is lighter-weight and useful when
+ * the buffering in BufferedBinaryEncoder is not desired and/or the Encoder is
+ * very short-lived.
+ * <p/>
+ * To construct, use
+ * {@link EncoderFactory#blockingDirectBinaryEncoder(OutputStream, BinaryEncoder)}
+ * <p/>
+ * BlockingDirectBinaryEncoder is not thread-safe
+ *
+ * @see BinaryEncoder
+ * @see EncoderFactory
+ * @see Encoder
+ * @see Decoder
+ */
+public class BlockingDirectBinaryEncoder extends DirectBinaryEncoder {
+  private static final ThreadLocal<BufferOutputStream> BUFFER = ThreadLocal.withInitial(BufferOutputStream::new);
+
+  private OutputStream originalStream;
+
+  private boolean inBlock = false;
+
+  private long blockItemCount;
+
+  /**
+   * Create a writer that sends its output to the underlying stream
+   * <code>out</code>.
+   *
+   * @param out The Outputstream to write to
+   */
+  public BlockingDirectBinaryEncoder(OutputStream out) {
+    super(out);
+  }
+
+  private void startBlock() {
+    if (inBlock) {
+      throw new RuntimeException("Nested Maps/Arrays are not supported by the BlockingDirectBinaryEncoder");

Review Comment:
   What I do like about this, is the exception thrown: it's in the right place to start looking if you want to lift this limitation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "clesaec (via GitHub)" <gi...@apache.org>.
clesaec commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1361760317


##########
lang/java/avro/src/main/java/org/apache/avro/io/BlockingDirectBinaryEncoder.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.io;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * An {@link Encoder} for Avro's binary encoding that does not buffer output.
+ * <p/>
+ * This encoder does not buffer writes, and as a result is slower than
+ * {@link BufferedBinaryEncoder}. However, it is lighter-weight and useful when
+ * the buffering in BufferedBinaryEncoder is not desired and/or the Encoder is
+ * very short-lived.
+ * <p/>
+ * To construct, use
+ * {@link EncoderFactory#blockingDirectBinaryEncoder(OutputStream, BinaryEncoder)}
+ * <p/>
+ * BlockingDirectBinaryEncoder is not thread-safe
+ *
+ * @see BinaryEncoder
+ * @see EncoderFactory
+ * @see Encoder
+ * @see Decoder
+ */
+public class BlockingDirectBinaryEncoder extends DirectBinaryEncoder {
+  private static final ThreadLocal<BufferOutputStream> BUFFER = ThreadLocal.withInitial(BufferOutputStream::new);

Review Comment:
   Why here a "static ThreadLocal" and not a simple field member.
   BinaryEncoder are not robust to multi-thread, but you can have 2 encoder on one thread, that are writing data alternatively.



##########
lang/java/avro/src/main/java/org/apache/avro/io/BlockingDirectBinaryEncoder.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.io;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * An {@link Encoder} for Avro's binary encoding that does not buffer output.
+ * <p/>
+ * This encoder does not buffer writes, and as a result is slower than
+ * {@link BufferedBinaryEncoder}. However, it is lighter-weight and useful when
+ * the buffering in BufferedBinaryEncoder is not desired and/or the Encoder is
+ * very short-lived.
+ * <p/>
+ * To construct, use
+ * {@link EncoderFactory#blockingDirectBinaryEncoder(OutputStream, BinaryEncoder)}
+ * <p/>
+ * BlockingDirectBinaryEncoder is not thread-safe
+ *
+ * @see BinaryEncoder
+ * @see EncoderFactory
+ * @see Encoder
+ * @see Decoder
+ */
+public class BlockingDirectBinaryEncoder extends DirectBinaryEncoder {
+  private static final ThreadLocal<BufferOutputStream> BUFFER = ThreadLocal.withInitial(BufferOutputStream::new);
+
+  private OutputStream originalStream;
+
+  private boolean inBlock = false;
+
+  private long blockItemCount;
+
+  /**
+   * Create a writer that sends its output to the underlying stream
+   * <code>out</code>.
+   *
+   * @param out The Outputstream to write to
+   */
+  public BlockingDirectBinaryEncoder(OutputStream out) {
+    super(out);
+  }
+
+  private void startBlock() {
+    if (inBlock) {
+      throw new RuntimeException("Nested Maps/Arrays are not supported by the BlockingDirectBinaryEncoder");

Review Comment:
   Just put BUFFER as a stack of outputStream, and it would become possible; but not mandatory :).



##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();

Review Comment:
   Could you test this 2 last byte array with a binary decoder to ensure it works with this new encoder. (if it can't, create a specific decoder class)



##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   Could you add a test where an array (or map) is skiped; if i understood well, by calling setItemCount(0) before end array ? (or i missed the purpose of this)
   ```java
       e.writeArrayStart();
       e.setItemCount(1);
       e.startItem();
       e.writeInt(1);
       e.setItemCount(0);  // here, to skip the array ??
       e.writeArrayEnd();
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "clesaec (via GitHub)" <gi...@apache.org>.
clesaec commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362297033


##########
lang/java/avro/src/main/java/org/apache/avro/io/BlockingDirectBinaryEncoder.java:
##########
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.io;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.ByteBuffer;
+
+/**
+ * An {@link Encoder} for Avro's binary encoding that does not buffer output.
+ * <p/>
+ * This encoder does not buffer writes, and as a result is slower than
+ * {@link BufferedBinaryEncoder}. However, it is lighter-weight and useful when
+ * the buffering in BufferedBinaryEncoder is not desired and/or the Encoder is
+ * very short-lived.
+ * <p/>
+ * To construct, use
+ * {@link EncoderFactory#blockingDirectBinaryEncoder(OutputStream, BinaryEncoder)}
+ * <p/>
+ * BlockingDirectBinaryEncoder is not thread-safe
+ *
+ * @see BinaryEncoder
+ * @see EncoderFactory
+ * @see Encoder
+ * @see Decoder
+ */
+public class BlockingDirectBinaryEncoder extends DirectBinaryEncoder {
+  private static final ThreadLocal<BufferOutputStream> BUFFER = ThreadLocal.withInitial(BufferOutputStream::new);
+
+  private OutputStream originalStream;
+
+  private boolean inBlock = false;
+
+  private long blockItemCount;
+
+  /**
+   * Create a writer that sends its output to the underlying stream
+   * <code>out</code>.
+   *
+   * @param out The Outputstream to write to
+   */
+  public BlockingDirectBinaryEncoder(OutputStream out) {
+    super(out);
+  }
+
+  private void startBlock() {
+    if (inBlock) {
+      throw new RuntimeException("Nested Maps/Arrays are not supported by the BlockingDirectBinaryEncoder");

Review Comment:
   Ok, let's go step by step; this is good for a first version, moreover if it's aims to be used for specific software like iceberg :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362012426


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   Let me check if I can add a test for it 👍 



##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   Let me check if I can add a test for it 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362021113


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   Hmm, it seems that the byte count is ignored in the Java implementation: https://github.com/apache/avro/blob/main/lang/java/avro/src/main/java/org/apache/avro/io/BinaryDecoder.java#L389-L397



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on code in PR #2521:
URL: https://github.com/apache/avro/pull/2521#discussion_r1362094356


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestBinaryEncoderFidelity.java:
##########
@@ -181,6 +181,50 @@ void directBinaryEncoder() throws IOException {
     assertArrayEquals(complexdata, result2);
   }
 
+  @Test
+  void blockingDirectBinaryEncoder() throws IOException {
+    ByteArrayOutputStream baos = new ByteArrayOutputStream();
+    BinaryEncoder e = factory.blockingDirectBinaryEncoder(baos, null);
+    generateData(e, true);
+
+    byte[] result = baos.toByteArray();
+    assertEquals(legacydata.length, result.length);
+    assertArrayEquals(legacydata, result);
+    baos.reset();
+
+    generateComplexData(e);
+    byte[] result2 = baos.toByteArray();
+    // blocking will cause different length, should be two bytes larger
+    assertEquals(complexdata.length + 2, result2.length);
+    // the first byte is the array start, with the count of items negative
+    assertEquals(complexdata[0] >>> 1, result2[0]);
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(1);
+    e.startItem();
+    e.writeInt(1);
+    e.writeArrayEnd();
+
+    // 1: 1 element in the array
+    // 2: 1 byte for the int
+    // 3: zigzag encoded int
+    // 4: 0 elements in the next block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 1, 2, 2, 0 });
+    baos.reset();
+
+    e.writeArrayStart();
+    e.setItemCount(0);
+    e.writeArrayEnd();
+
+    // This is correct
+    // 0: 0 elements in the block
+    assertArrayEquals(baos.toByteArray(), new byte[] { 0 });
+    baos.reset();

Review Comment:
   Added a test for it. LMKWYT



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] AVRO-3871: Add blocking direct binary encoder [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on PR #2521:
URL: https://github.com/apache/avro/pull/2521#issuecomment-1773911851

   Thanks @clesaec and @opwvhk for the review, appreciate it! 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org