You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/02/22 20:43:03 UTC

[GitHub] [kafka] rhauch edited a comment on pull request #9950: KAFKA-12170: Fix for Connect Cast SMT to correctly transform a Byte array into a string

rhauch edited a comment on pull request #9950:
URL: https://github.com/apache/kafka/pull/9950#issuecomment-783634110


   Before this fix, the changes made in #4820 ([KAFKA-6684](https://issues.apache.org/jira/browse/KAFKA-6684)) resulted in `toString()` being called on `byte[]` and `ByteBuffer`. As highlighted in this PR and issue that `ByteBuffer.toString()` is not useful, but the `toString()` on `byte[]` still works. This PR seems to change that behavior, which would not be backward compatible.
   
   The discussion on PR #4820 also talked about making this compatible with `Values.convertToString(...)`, which for `byte[]` and `ByteBuffer` results in a base 64 encoded string (with `ISO-8859-1` encoding). See the [Values code for details](https://github.com/apache/kafka/blob/95f51539c8d0b88bd7f285011d42e2d1117107de/connect/api/src/main/java/org/apache/kafka/connect/data/Values.java#L673-L682).
   
   ~Because of this, it seems much more sensible to also use base64 here for `ByteBuffer` so it matches the existing behavior with `byte[]`.~
   
   ~I agree that it maybe doesn't suffice in all user situations, so if we also want to support other encodings we'd need other config changes that will require using the KIP mechanism to propose such enhancements.~
   
   Actually, the existing code simply outputs the `toString()` result of a `byte[]` object, which is of the form `[B@22ef9844`, which is the `Object.toString()` implementation that includes the class alias (e.g., `[B` is a byte array) and the hex representation of the object's `hashCode()`.
   
   However, given that none of the cast forms of `byte[]` or `ByteBuffer[]` are useful, then perhaps it's worth adding the previous discussion on #4820 mentioning that:
   > As @ewencp suggested, for consistency perhaps we can use the same string formats as SimpleHeaderConverter: https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/data/Values.java,
   
   As mentioned above, the `Values.convertToString(...)` uses a base 64 encoding. And this aligns with @kkonstantine's suggestion:
   > Why are we selecting hex here and not base64 for example? Hex of course is less efficient.
   
   WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org