You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/16 19:32:10 UTC

[GitHub] [hudi] n3nash commented on a change in pull request #2453: [HUDI-1533] Make SerializableSchema work for large schemas

n3nash commented on a change in pull request #2453:
URL: https://github.com/apache/hudi/pull/2453#discussion_r559015240



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/SerializableSchema.java
##########
@@ -62,12 +62,17 @@ private void readObject(ObjectInputStream in) throws IOException {
   
   // create a public write method for unit test
   public void writeObjectTo(ObjectOutputStream out) throws IOException {
-    out.writeUTF(schema.toString());
+    // Note: writeUTF cannot support string length > 64K. So use writeObject which has small overhead (relatively).
+    out.writeObject(schema.toString());
   }
 
   // create a public read method for unit test
   public void readObjectFrom(ObjectInputStream in) throws IOException {
-    schema = new Schema.Parser().parse(in.readUTF());
+    try {
+      schema = new Schema.Parser().parse(in.readObject().toString());

Review comment:
       Can we do the following : 
   
   int length = in.readInt();
   byte[] value = new byte[length];
   in.readFully(data);
   String schemaStr = new String(value, "UTF-8");
   
   This will ensure UTF encoding/decoding

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/config/SerializableSchema.java
##########
@@ -62,12 +62,17 @@ private void readObject(ObjectInputStream in) throws IOException {
   
   // create a public write method for unit test
   public void writeObjectTo(ObjectOutputStream out) throws IOException {
-    out.writeUTF(schema.toString());
+    // Note: writeUTF cannot support string length > 64K. So use writeObject which has small overhead (relatively).
+    out.writeObject(schema.toString());

Review comment:
       Instead of this, can we do : 
   byte[] data = schema.toString.getBytes("UTF-8");
   out.writeBytes(data) to ensure we don't lost UTF encoding ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org