You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martijn Visser (Jira)" <ji...@apache.org> on 2021/10/06 11:46:00 UTC

[jira] [Commented] (FLINK-24459) Performance improvement of file sink on Nexmark

    [ https://issues.apache.org/jira/browse/FLINK-24459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424951#comment-17424951 ] 

Martijn Visser commented on FLINK-24459:
----------------------------------------

[~trushev] Thanks for the report and metrics. I'll bring this up with the team, much appreciated

> Performance improvement of file sink on Nexmark
> -----------------------------------------------
>
>                 Key: FLINK-24459
>                 URL: https://issues.apache.org/jira/browse/FLINK-24459
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Alexander Trushev
>            Assignee: Alexander Trushev
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: after.jfr.zip, after_cpu.png, after_mem.png, before.jfr.zip, before_cpu.png, before_mem.png
>
>
> h3. Context
> {{PartitionPathUtils.escapePathName}} is a pretty simple method that takes {{String}}, allocates {{StringBuilder}}, appends original or escaped chars, and outputs the result {{String}}.
> Filesystem sink calls the method several times for each element to determine bucket id. Because of this, it is a hot spot on a workload that writes intensively to filesystem, such as [nexmark q10|https://github.com/nexmark/nexmark/blob/master/nexmark-flink/src/main/resources/queries/q10.sql]. On my local machine escaping of chars takes 9.53% CPU and 17.8% mem allocations of the whole TaskManager process.
> h3. Proposal
> {{PartitionPathUtils.escapePathName}} improvements
> # Use more efficient {{Integer.toHexString}} instead of {{String.format}}
> # Do not allocate new string when there is no escapable char in the original string
> # Allocate {{StringBuilder}} depending on the original string length instead of the default value
> h3. Benefit
> Experiment on local machine.
> 1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration + object reuse option.
> Before: flink-1.14.0
> After: flink-1.14.0 + patch with the improvements
> || Nexmark q10 || Before || After ||
> | CPU samples of escapePathName() (% of all) | 9.53 | 1.64 |
> | Memory allocations by escapePathName() (% of all) | 17.8 | 2.98 |
> | Throughput/Cores (K/s) | 107.64 | 119.42 |
> Diff: CPU *-7.89*%, Memory *-14.82*%, Throughput *+10.9*%
> Profiling reports are in the attachment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)