You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/11/22 16:58:21 UTC

[GitHub] [kafka] joshuagrisham opened a new pull request #11523: KAFKA-10627: Added support for Connect TimestampConverter to convert multiple fields using a comma-separated list, and multiple input formats when parsing from a string (fixed after rebase).

joshuagrisham opened a new pull request #11523:
URL: https://github.com/apache/kafka/pull/11523


   > Note this PR is copied from #9492 where I mistakenly did a rebase on a very old change which added thousands of commits, so this is a second attempt to get a very clean PR for this change instead. 
   
   I have made an update to **TimestampConverter** Connect transform to address the main issues that I logged in [KAFKA-10627](https://issues.apache.org/jira/browse/KAFKA-10627).
   
   Namely, that it now ...
   
   - supports multiple fields via a new configuration parameter `fields` as a comma-separated list of field names. The old parameter `field` is still supported for compatibility but the value is moved to the new parameter.
   - supports a DateTimeFormatter-compatible pattern string that can support multiple timestamp formats for parsing input of string values to whatever target you configure (e.g. parsing strings to Timestamp type).
      - `format` config is now split into two: `format.input` and `format.output` but you can still just send `format` by itself if you do not need to use a more complicated input pattern. When providing only `format`, the string pattern which you provide will be used for both `format.input` and `format.output`.
   
   I realized that kafka is using `java.util.Date` everywhere and as part of its core types (including in Schemas, values, etc).  In theory it would be good over time to upgrade to `java.time` classes but on first reflection it seems like quite a big overhaul to do this.
   
   So instead I focused on the specific problem at hand: parsing strings into `Date` where the strings can come in different formats.  So for this part alone I changed to use `DateTimeFormatter` so we can use multiple patterns to match input strings and convert them to a `java.util.Date` after.
   
   I also updated some of the way the Config parameters and values work, to bring in line with the other classes and similar to what I did with #9470.
   
   #### String Input and Output Timestamp Format updates
   
   Because now for input formats we allow multiple different possibilities using pattern matching, this does not work for the output format of a Timestamp to a String (which was another possibility of this transform).  So I have changed the configuration a bit... now there are three parameters:
   
   - `format` which is the original one. You can still use this one, and it will set both input (parsing) and output (Date/Timestamp to string format) based on this format.
   - `format.input` is a new parameter, where you can specify a DateTimeFormatter-compatible pattern string that supports multiple different formats in case you have a mix in your data.  For just one example, now you can use something like this as `format.input` and it will catch a lot of different variations which you might see in one timestamp field: `"[yyyy-MM-dd[['T'][ ]HH:mm:ss[.SSSSSSSz][.SSS[XXX][X]]]]"`
   - `format.output` is a new parameter which only controls the output of a Date/Timestamp to target type of `string`. This is the same as before and still uses `SimpleDateFormat` to create the output string, it is just controlled in a separate parameter now.
   
   I also added some code which checks the value of each of these three.  Basically it forces you to use either `format`, or one or both of the new parameters -- you cannot mix the old and new together.  In the end, `format.input` and `format.output` are the ones used in the rest of the logic, but the code first compares `format` against these values and sets the value for both of the new parameters depending on what was sent in the config.
   
   #### Support for multiple fields instead of one single field
   
   I changed the `field` parameter to now be called `fields` and supports multiple values as a comma-separated list.  I used this new `ConfigUtils.translateDeprecatedConfigs` method to provide automatic translation of of the old parameter to the new one as well.
   
   With this change I also updated the `apply` methods so that they loop through each field and check against the list of `fields`.  Now you can specify a comma-separated list of multiple fields to have the same input format/output type applied.
   
   Unit tests have been added for both new updates (string formatting and multiple field support).
   
   As I looked at this one then I realized that maybe it would be good to add `recursive` support similar to what I have done in #9470 but I guess that can come at another day!
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] joshuagrisham commented on pull request #11523: KAFKA-10627: Added support for Connect TimestampConverter to convert multiple fields using a comma-separated list, and multiple input formats when parsing from a string (fixed after rebase).

Posted by GitBox <gi...@apache.org>.
joshuagrisham commented on pull request #11523:
URL: https://github.com/apache/kafka/pull/11523#issuecomment-975732575


   @rhauch this is the new PR to replace #9492 per your comment there


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org