You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@thrift.apache.org by GitBox <gi...@apache.org> on 2021/10/02 16:09:26 UTC
[GitHub] [thrift] fishy commented on a change in pull request #2439: THRIFT-5443: add support for partial Thrift deserialization

fishy commented on a change in pull request #2439:
URL: https://github.com/apache/thrift/pull/2439#discussion_r720696130



##########
File path: lib/java/src/org/apache/thrift/TDeserializer.java
##########
@@ -353,4 +418,265 @@ private TField locateField(byte[] bytes, TFieldIdEnum fieldIdPathFirst, TFieldId
   public void fromString(TBase base, String data) throws TException {
     deserialize(base, data.getBytes());
   }
+
+  // ----------------------------------------------------------------------
+  // Methods related to partial deserialization.
+
+  /**
+   * Deserializes the given serialized blob.
+   *
+   * @param bytes the serialized blob.
+   * @return deserialized instance.
+   * @throws TException if an error is encountered during deserialization.
+   */
+  public Object partialDeserializeObject(byte[] bytes) throws TException {
+    return this.deserialize(bytes, 0, bytes.length);
+  }
+
+  /**
+   * Deserializes the given serialized blob.
+   *
+   * @param bytes the serialized blob.
+   * @param offset the blob is read starting at this offset.
+   * @param length the size of blob read (in number of bytes).
+   * @return deserialized instance.
+   * @throws TException if an error is encountered during deserialization.
+   */
+  public Object deserialize(byte[] bytes, int offset, int length) throws TException {
+    ensurePartialDeserializationMode();

Review comment:
       We already have a `deserialize` function with very similar signature:
   ```
   public void deserialize(TBase base, byte[] bytes, int offset, int length) throws TException
   ```
   adding this function will make it really confusing to users on what to call. my suggestions are:
   1. Maybe rename this one to `deserializeObject` to emphasize that it returns `Object` instead of writes to `TBase`.
   2. Why does this only support partial mode (it throws a runtime exception when the `TDeserializer` was not consutrcted with partial mode), while there's nothing in its name or even javadoc mentioning that? that sounds like a huge footgun. Is there even a reason this should call `ensurePartialDeserializationMode`?

##########
File path: lib/java/src/org/apache/thrift/protocol/TProtocol.java
##########
@@ -180,4 +181,145 @@ public void reset() {}
   public Class<? extends IScheme> getScheme() {
     return StandardScheme.class;
   }
+
+  // -----------------------------------------------------------------
+  // Additional methods to improve performance.
+
+  public int readFieldBeginData() throws TException {
+    // Derived classes should provide a more efficient version of this
+    // method if allowed by the encoding used by that protocol.
+    TField tfield = this.readFieldBegin();
+    return TFieldData.encode(tfield.type, tfield.id);
+  }
+
+  public void skip(byte fieldType) throws TException {
+    this.skip(fieldType, Integer.MAX_VALUE);
+  }
+
+  public void skip(byte fieldType, int maxDepth) throws TException {
+    if (maxDepth <= 0) {
+      throw new TException("Maximum skip depth exceeded");
+    }
+
+    switch (fieldType) {
+      case TType.BOOL:
+        this.skipBool();
+        break;
+
+      case TType.BYTE:
+        this.skipByte();
+        break;
+
+      case TType.I16:
+        this.skipI16();
+        break;
+
+      case TType.I32:
+        this.skipI32();
+        break;
+
+      case TType.I64:
+        this.skipI64();
+        break;
+
+      case TType.DOUBLE:
+        this.skipDouble();
+        break;
+
+      case TType.STRING:
+        this.skipBinary();
+        break;
+
+      case TType.STRUCT:
+        this.readStructBegin();
+        while (true) {
+          int tfieldData = this.readFieldBeginData();
+          byte tfieldType = TFieldData.getType(tfieldData);
+          if (tfieldType == TType.STOP) {
+            break;
+          }
+          this.skip(tfieldType, maxDepth - 1);
+          this.readFieldEnd();
+        }
+        this.readStructEnd();
+        break;
+
+      case TType.MAP:
+        TMap map = this.readMapBegin();
+        for (int i = 0; i < map.size; i++) {
+          this.skip(map.keyType, maxDepth - 1);
+          this.skip(map.valueType, maxDepth - 1);
+        }
+        this.readMapEnd();
+        break;
+
+      case TType.SET:
+        TSet set = this.readSetBegin();
+        for (int i = 0; i < set.size; i++) {
+          this.skip(set.elemType, maxDepth - 1);
+        }
+        this.readSetEnd();
+        break;
+
+      case TType.LIST:
+        TList list = this.readListBegin();
+        for (int i = 0; i < list.size; i++) {
+          this.skip(list.elemType, maxDepth - 1);
+        }
+        this.readListEnd();
+        break;
+
+      default:
+        throw new TProtocolException(
+            TProtocolException.INVALID_DATA, "Unrecognized type " + fieldType);
+    }
+  }
+
+  protected void skipBool() throws TException {
+    this.skipBytes(1);
+  }
+
+  protected void skipByte() throws TException {
+    this.skipBytes(1);
+  }
+
+  protected void skipI16() throws TException {
+    this.skipBytes(2);
+  }
+
+  protected void skipI32() throws TException {
+    this.skipBytes(4);
+  }
+
+  protected void skipI64() throws TException {
+    this.skipBytes(8);
+  }
+
+  protected void skipDouble() throws TException {
+    this.skipBytes(8);
+  }

Review comment:
       I don't think the default implementations should use hardcoded number of bytes. the default implementations should just call `read*` and discard the result. this guarantees correctness, and leaves individual protocols the room to override the default implementation to improve performance. Using hardcoded number of bytes cannot guarantee the correctness.

##########
File path: lib/java/src/org/apache/thrift/TDeserializer.java
##########
@@ -61,6 +78,54 @@ public TDeserializer(TProtocolFactory protocolFactory) throws TTransportExceptio
     protocol_ = protocolFactory.getProtocol(trans_);
   }
 
+  /**
+   * Construct a new TDeserializer that supports partial deserialization
+   * that outputs instances of type controlled by the given {@code processor}.
+   *
+   * @param thriftClass a TBase derived class.
+   * @param fieldNames list of fields to deserialize.
+   * @param processor the Processor that handles deserialized field values.
+   * @param protocolFactory the Factory to create a protocol.
+   */
+  public TDeserializer(

Review comment:
       (maybe it just works, I'm not familiar with the java codebase and this is not obvious to me)
   
   I don't see you changing any of the existing public functions of `TDeserializer`, for example:
   ```
   public void deserialize(TBase base, byte[] bytes) throws TException
   public void deserialize(TBase base, byte[] bytes, int offset, int length) throws TException
   public void deserialize(TBase base, java.lang.String data, java.lang.String charset) throws TException
   ```
   Since you are adding new constructors to `TDeserializer`, you need to ensure that those existing public functions still work as intended. When someone uses this constructor to create a `TDeserializer`, then calls `deserialize(TBase base, byte[] bytes)`, they would expect that `deserialize` function does partial deserialization, not the full one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@thrift.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org