You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/12/07 15:41:52 UTC

[GitHub] [kafka] twobeeb opened a new pull request #11575: KAFKA-13511: Update TimestampConverter support unix epoch as millis, micros, and seconds

twobeeb opened a new pull request #11575:
URL: https://github.com/apache/kafka/pull/11575


   Currently, the SMT TimestampConverter can convert Timestamp from either source String, Long or Date into target String, Long or Date.
   
   The problem is that Long source or target is required to be epoch in milliseconds.
   
   In many cases, epoch is represented with different precisions. This leads to several Jira tickets :
   
   [KAFKA-12364](https://issues.apache.org/jira/browse/KAFKA-12364): add support for date from int32 to timestampconverter
   [KAFKA-10561](https://issues.apache.org/jira/browse/KAFKA-10561): Support microseconds precision for Timestamps
   I propose to add a new config to TimestampConverter called "epoch.precision" which defaults to "millis" so as to not impact existing code, and allows for more precisions : seconds, millis, micros.
   ````json
   "transforms": "TimestampConverter",
   "transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
   "transforms.TimestampConverter.field": "event_date",
   "transforms.TimestampConverter.epoch.precision": "micros",
   "transforms.TimestampConverter.target.type": "Timestamp"
   ````
   Exactly like "format" field which is used as input when the source in String and output when the target.type is string, this new field would be used as input when the field is Long, and as output when the target.type is "unix"
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison commented on pull request #11575: KAFKA-13511: Update TimestampConverter support unix epoch as millis, micros, and seconds

Posted by GitBox <gi...@apache.org>.
mimaison commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-1011045546


   Sorry @twobeeb, I'll try to take a look this afternoon. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison commented on a change in pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
mimaison commented on a change in pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#discussion_r811852872



##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -83,12 +78,36 @@
     private static final String TYPE_TIMESTAMP = "Timestamp";
     private static final Set<String> VALID_TYPES = new HashSet<>(Arrays.asList(TYPE_STRING, TYPE_UNIX, TYPE_DATE, TYPE_TIME, TYPE_TIMESTAMP));

Review comment:
       `VALID_TYPES` can now be removed too




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison merged pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
mimaison merged pull request #11575:
URL: https://github.com/apache/kafka/pull/11575


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison commented on pull request #11575: KAFKA-13511: Update TimestampConverter support unix epoch as millis, micros, and seconds

Posted by GitBox <gi...@apache.org>.
mimaison commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-989705328


   Hi @twobeeb, thanks for the PR, this looks like a useful addition. However as this is adding a new configuration, this change requires a KIP before we can accept it.
   Take a look at https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals. This page details the KIP process. Let me know if you have any questions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on pull request #11575: KAFKA-13511: Update TimestampConverter support unix epoch as millis, micros, and seconds

Posted by GitBox <gi...@apache.org>.
twobeeb commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-1011011375


   Hi @mimaison, 
   I haven't received any update on the [KIP-808](https://lists.apache.org/thread/t51lmxjdt3k4y990s2c378529lwtt0q0) which I created as per your suggestion.
   
   Could you be of any help ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
twobeeb commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-1042696537


   @tombentley 
   Thanks for your inputs. 
   Regarding your last comment which I fixed, the same case could be made for `target.type`.
   https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java#L254-L257
   Should we address this ? If so, this PR or new one ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison commented on a change in pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
mimaison commented on a change in pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#discussion_r811318744



##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -83,12 +78,36 @@
     private static final String TYPE_TIMESTAMP = "Timestamp";
     private static final Set<String> VALID_TYPES = new HashSet<>(Arrays.asList(TYPE_STRING, TYPE_UNIX, TYPE_DATE, TYPE_TIME, TYPE_TIMESTAMP));
 
+    private static final String UNIX_PRECISION_MILLIS = "milliseconds";
+    private static final String UNIX_PRECISION_MICROS = "microseconds";
+    private static final String UNIX_PRECISION_NANOS = "nanoseconds";
+    private static final String UNIX_PRECISION_SECONDS = "seconds";
+    private static final Set<String> VALID_UNIX_PRECISIONS = new HashSet<>(

Review comment:
       This field is not used, we can remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
twobeeb commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-1040105581


   Hi everyone,
   Since [KIP-808](https://cwiki.apache.org/confluence/display/KAFKA/KIP-808%3A+Add+support+for+different+unix+precisions+in+TimestampConverter+SMT) is now adopted, could you review and comment this PR ?
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on a change in pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
twobeeb commented on a change in pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#discussion_r808393677



##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -65,14 +66,22 @@
     public static final String FORMAT_CONFIG = "format";
     private static final String FORMAT_DEFAULT = "";
 
+    public static final String UNIX_PRECISION_CONFIG = "unix.precision";
+    private static final String UNIX_PRECISION_DEFAULT = "milliseconds";
+
     public static final ConfigDef CONFIG_DEF = new ConfigDef()
             .define(FIELD_CONFIG, ConfigDef.Type.STRING, FIELD_DEFAULT, ConfigDef.Importance.HIGH,
                     "The field containing the timestamp, or empty if the entire value is a timestamp")
             .define(TARGET_TYPE_CONFIG, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH,
                     "The desired timestamp representation: string, unix, Date, Time, or Timestamp")
             .define(FORMAT_CONFIG, ConfigDef.Type.STRING, FORMAT_DEFAULT, ConfigDef.Importance.MEDIUM,
                     "A SimpleDateFormat-compatible format for the timestamp. Used to generate the output when type=string "
-                            + "or used to parse the input if the input is a string.");
+                            + "or used to parse the input if the input is a string.")
+            .define(UNIX_PRECISION_CONFIG, ConfigDef.Type.STRING, UNIX_PRECISION_DEFAULT, ConfigDef.Importance.LOW,

Review comment:
       Agreed, I've been misled by existing validation code which spans across 2 fields, making it impossible to use Validator. I'll update accordingly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on pull request #11575: KAFKA-13511: Update TimestampConverter support unix epoch as millis, micros, and seconds

Posted by GitBox <gi...@apache.org>.
twobeeb commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-988049565


   @mimaison @rhauch before considering going further with this PR, any chance I could to get your opinion on this subject ?
   
   The main idea behind this PR is that Avro logical types are not fully supported in Kafka Connect but even if they were, it would not help schema-less messages that need to convert epoch with a precision different than ms.
   
   For this reason, I consider this would be an interesting addition.
   Thanks for your help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on a change in pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
twobeeb commented on a change in pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#discussion_r811669521



##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -83,12 +78,36 @@
     private static final String TYPE_TIMESTAMP = "Timestamp";
     private static final Set<String> VALID_TYPES = new HashSet<>(Arrays.asList(TYPE_STRING, TYPE_UNIX, TYPE_DATE, TYPE_TIME, TYPE_TIMESTAMP));
 
+    private static final String UNIX_PRECISION_MILLIS = "milliseconds";
+    private static final String UNIX_PRECISION_MICROS = "microseconds";
+    private static final String UNIX_PRECISION_NANOS = "nanoseconds";
+    private static final String UNIX_PRECISION_SECONDS = "seconds";
+    private static final Set<String> VALID_UNIX_PRECISIONS = new HashSet<>(

Review comment:
       Thanks, sorry about this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] twobeeb commented on pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
twobeeb commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-1040105581


   Hi everyone,
   Since [KIP-808](https://cwiki.apache.org/confluence/display/KAFKA/KIP-808%3A+Add+support+for+different+unix+precisions+in+TimestampConverter+SMT) is now adopted, could you review and comment this PR ?
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] tombentley commented on a change in pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
tombentley commented on a change in pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#discussion_r808304776



##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -65,14 +66,22 @@
     public static final String FORMAT_CONFIG = "format";
     private static final String FORMAT_DEFAULT = "";
 
+    public static final String UNIX_PRECISION_CONFIG = "unix.precision";
+    private static final String UNIX_PRECISION_DEFAULT = "milliseconds";
+
     public static final ConfigDef CONFIG_DEF = new ConfigDef()
             .define(FIELD_CONFIG, ConfigDef.Type.STRING, FIELD_DEFAULT, ConfigDef.Importance.HIGH,
                     "The field containing the timestamp, or empty if the entire value is a timestamp")
             .define(TARGET_TYPE_CONFIG, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH,
                     "The desired timestamp representation: string, unix, Date, Time, or Timestamp")
             .define(FORMAT_CONFIG, ConfigDef.Type.STRING, FORMAT_DEFAULT, ConfigDef.Importance.MEDIUM,
                     "A SimpleDateFormat-compatible format for the timestamp. Used to generate the output when type=string "
-                            + "or used to parse the input if the input is a string.");
+                            + "or used to parse the input if the input is a string.")
+            .define(UNIX_PRECISION_CONFIG, ConfigDef.Type.STRING, UNIX_PRECISION_DEFAULT, ConfigDef.Importance.LOW,

Review comment:
       There's an overload of `define()` that takes a validator, which should mean you don't need to validate the value yourself and also ensure the valid values are documented;
   ```suggestion
               .define(UNIX_PRECISION_CONFIG, ConfigDef.Type.STRING, UNIX_PRECISION_DEFAULT,
                       ConfigDef.ValidString.in(
                               UNIX_PRECISION_NANOS, UNIX_PRECISION_MICROS,
                               UNIX_PRECISION_MILLIS, UNIX_PRECISION_SECONDS), 
                       ConfigDef.Importance.LOW,
   ```

##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -65,14 +66,22 @@
     public static final String FORMAT_CONFIG = "format";
     private static final String FORMAT_DEFAULT = "";
 
+    public static final String UNIX_PRECISION_CONFIG = "unix.precision";
+    private static final String UNIX_PRECISION_DEFAULT = "milliseconds";
+
     public static final ConfigDef CONFIG_DEF = new ConfigDef()
             .define(FIELD_CONFIG, ConfigDef.Type.STRING, FIELD_DEFAULT, ConfigDef.Importance.HIGH,
                     "The field containing the timestamp, or empty if the entire value is a timestamp")
             .define(TARGET_TYPE_CONFIG, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH,
                     "The desired timestamp representation: string, unix, Date, Time, or Timestamp")
             .define(FORMAT_CONFIG, ConfigDef.Type.STRING, FORMAT_DEFAULT, ConfigDef.Importance.MEDIUM,
                     "A SimpleDateFormat-compatible format for the timestamp. Used to generate the output when type=string "
-                            + "or used to parse the input if the input is a string.");
+                            + "or used to parse the input if the input is a string.")
+            .define(UNIX_PRECISION_CONFIG, ConfigDef.Type.STRING, UNIX_PRECISION_DEFAULT, ConfigDef.Importance.LOW,
+                    "The desired unix precision for the timestamp. Used to generate the output when type=unix " +
+                            "or used to parse the input if the input is a Long." +
+                            "Note: This SMT will cause precision loss during conversions from and to values with sub-milliseconds components.");

Review comment:
       ```suggestion
                               "Note: This SMT will cause precision loss during conversions from, and to, values with sub-millisecond components.");
   ```

##########
File path: connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java
##########
@@ -65,14 +66,22 @@
     public static final String FORMAT_CONFIG = "format";
     private static final String FORMAT_DEFAULT = "";
 
+    public static final String UNIX_PRECISION_CONFIG = "unix.precision";
+    private static final String UNIX_PRECISION_DEFAULT = "milliseconds";
+
     public static final ConfigDef CONFIG_DEF = new ConfigDef()
             .define(FIELD_CONFIG, ConfigDef.Type.STRING, FIELD_DEFAULT, ConfigDef.Importance.HIGH,
                     "The field containing the timestamp, or empty if the entire value is a timestamp")
             .define(TARGET_TYPE_CONFIG, ConfigDef.Type.STRING, ConfigDef.Importance.HIGH,
                     "The desired timestamp representation: string, unix, Date, Time, or Timestamp")
             .define(FORMAT_CONFIG, ConfigDef.Type.STRING, FORMAT_DEFAULT, ConfigDef.Importance.MEDIUM,
                     "A SimpleDateFormat-compatible format for the timestamp. Used to generate the output when type=string "
-                            + "or used to parse the input if the input is a string.");
+                            + "or used to parse the input if the input is a string.")
+            .define(UNIX_PRECISION_CONFIG, ConfigDef.Type.STRING, UNIX_PRECISION_DEFAULT, ConfigDef.Importance.LOW,
+                    "The desired unix precision for the timestamp. Used to generate the output when type=unix " +

Review comment:
       ```suggestion
                       "The desired Unix precision for the timestamp. Used to generate the output when type=unix " +
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] mimaison commented on pull request #11575: KAFKA-13511: Add support for different unix precisions in TimestampConverter SMT

Posted by GitBox <gi...@apache.org>.
mimaison commented on pull request #11575:
URL: https://github.com/apache/kafka/pull/11575#issuecomment-1047623886


   > @tombentley Thanks for your inputs. Regarding your last comment which I fixed, the same case could be made for `target.type`. https://github.com/apache/kafka/blob/trunk/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/TimestampConverter.java#L254-L257 Should we address this ? If so, this PR or new one ?
   
   Yes feel free to address it in this PR if you want.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org