You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Arek Burdach (Jira)" <ji...@apache.org> on 2021/02/25 09:59:00 UTC

[jira] [Commented] (NIFI-8161) NiFi EL: migration from SimpleDateFormat to DateTimeFormatter

    [ https://issues.apache.org/jira/browse/NIFI-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290817#comment-17290817 ] 

Arek Burdach commented on NIFI-8161:
------------------------------------

Can anybody take a look on this change? [~exceptionfactory] [~mtien] maybe? It is not too much controversial change. We'd successfully deployed it on our production (about 50 flows integrating different sources in different formats) and it caused about 5 changes in expressions. Mainly because of bad usage of format e.g. was used `mm` twice instead of `MM` and some slight differences in interpreting time zone format. Overall this change cause about 30% reduction of cpu load on our NiFi instances. 

> NiFi EL: migration from SimpleDateFormat to DateTimeFormatter
> -------------------------------------------------------------
>
>                 Key: NIFI-8161
>                 URL: https://issues.apache.org/jira/browse/NIFI-8161
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Arek Burdach
>            Priority: Major
>              Labels: perfomance, pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In NiFi Expression Language we are commonly using SimpleDateFormat. It is old, inefficient approach: due to mutability of SimpleDateFormat it need to be recreated for each operation. Also this format is lax in some places where user would like to be more strict. In my opinion better approach would be usage of java8 DateTimeFormatter instead.
> I've done some benchmarks that you can check on your own in FormatEvaluatorBenchmark. Results on my 8-core, i7-1065G7:
> - before change (SimpleDateFormat): 11.230 ± 5.407  us/op
> - after switching to DateTimeFormatter API: 4.747 ± 0.426  us/op
> - after introduction of preparation of formatter for literal formats: 2.025 ± 0.055  us/op
> This change is not 100% transparent so some changes might be necessary in users code. Most of differences are visible in modifications that I've made in TestQuery tests:
> -  back tick (`) for escaping of extra characters is not supported anymore - only single quote is supported
> - "repeated" syntax like "dd" for days strictly check if two digits were provided -  if someone need to use more lax syntax, need to use single "d" syntax
> [update]
> After switching to lenient mode, parsing is compatible with SimpleDateFormat in second point ("repeated" syntax)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)