You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "jonvex (via GitHub)" <gi...@apache.org> on 2023/02/21 20:54:45 UTC

[GitHub] [hudi] jonvex commented on a diff in pull request #8010: [HUDI-4442] [HUDI-5001] Sanitize JsonConversion and RowSource

jonvex commented on code in PR #8010:
URL: https://github.com/apache/hudi/pull/8010#discussion_r1113557043


##########
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java:
##########
@@ -720,10 +721,22 @@ public static Schema getNullSchema() {
    * @return sanitized name
    */
   public static String sanitizeName(String name) {
-    if (name.substring(0, 1).matches(INVALID_AVRO_FIRST_CHAR_IN_NAMES)) {
-      name = name.replaceFirst(INVALID_AVRO_FIRST_CHAR_IN_NAMES, MASK_FOR_INVALID_CHARS_IN_NAMES);
+    return sanitizeName(name, MASK_FOR_INVALID_CHARS_IN_NAMES);
+  }
+
+  /**
+   * Sanitizes Name according to Avro rule for names.
+   * Removes characters other than the ones mentioned in https://avro.apache.org/docs/current/spec.html#names .
+   *
+   * @param name input name
+   * @param invalidCharMask replacement for invalid characters.
+   * @return sanitized name
+   */
+  public static String sanitizeName(String name, String invalidCharMask) {
+    if (INVALID_AVRO_FIRST_CHAR_IN_NAMES_PATTERN.matcher(name.substring(0, 1)).matches()) {

Review Comment:
   https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#zz-1.1 apparently if ^ is in brackets it inverts the match



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org