You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "intr3p1d (via GitHub)" <gi...@apache.org> on 2024/02/02 02:45:45 UTC

[I] `ISO_8859_1` breaking Hangul in CLP `_logtype` column [pinot]

intr3p1d opened a new issue, #12352:
URL: https://github.com/apache/pinot/issues/12352

   # Cause
   
   `clp-ffi-java` [internally use](https://github.com/y-scope/clp-ffi-java/blob/c4a74dbdeb09bd4e7e3d119826dddbe5005ccf53/src/main/java/com/yscope/clp/compressorfrontend/EncodedMessage.java#L30-L36) `StandardCharsets.ISO_8859_1` in `EncodedMessage.getLogTypeAsString();`
   ![image](https://github.com/apache/pinot/assets/37623810/c7e02040-a714-4e54-be2a-7b36e9341003)
   (`getDictionaryVarsAsStrings` also)
   
   # Effect
   https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-plugins/pinot-input-format/pinot-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java#L151-L154
   
   This makes some characters broken in `column_logtype` like this:
   `Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: 0 이상이어야 합니다`
   into
   `Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from:  이상이어야 합니다`
   
   This is fine after going through the CLPDECODE function, but when dealing with individual `_logtype` columns, these broken strings don't seem appropriate (LIKE searches, etc).
   
   The `clp-ffi-java` library makes all EncodedMessage member variables public. So it would be nice if pinot's `CLPLogMessageDecoder` could handle them (or at least match the other encodings used internally by pinot).
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org