You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2019/12/04 08:39:46 UTC

[GitHub] [flink] shuttie commented on a change in pull request #10358: [FLINK-14346] [serialization] faster implementation of StringValue writeString and readString

shuttie commented on a change in pull request #10358: [FLINK-14346] [serialization] faster implementation of StringValue writeString and readString
URL: https://github.com/apache/flink/pull/10358#discussion_r353607833
 
 

 ##########
 File path: flink-core/src/main/java/org/apache/flink/types/StringValue.java
 ##########
 @@ -759,56 +761,142 @@ public static String readString(DataInput in) throws IOException {
 			}
 			len |= curr << shift;
 		}
-		
+
 		// subtract one for the null length
 		len -= 1;
-		
-		final char[] data = new char[len];
 
-		for (int i = 0; i < len; i++) {
-			int c = in.readUnsignedByte();
-			if (c < HIGH_BIT) {
-				data[i] = (char) c;
-			} else {
+		/* as we have no idea about byte-length of the serialized string, we cannot fully
+		 * read it into memory buffer. But we can do it in an optimistic way:
+		 * 1. In a happy case when the string is an us-ascii one, then byte_len == char_len
+		 * 2. If we spot at least one character with code >= 127, then we reallocate the buffer
+		 * to accommodate for the next characters.
+		 */
+
+		// happily assume that the string is an 7 bit us-ascii one
+		byte[] buf = new byte[len];
+		in.readFully(buf);
+
+		final char[] data = new char[len];
+		int charPosition = 0;
+		int bufSize = len;
+		int bytePosition = 0;
+
+		while (charPosition < len) {
+			// there is at least `char count - char position` bytes left in case if all the
+			// remaining characters are 7 bit.
+			int remainingBytesEstimation = len - charPosition;
 
 Review comment:
   A nice catch. This variable only used at the buffer refill operation, so it's reasonable not to compute it for every character.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services