You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Alexander Trushev (Jira)" <ji...@apache.org> on 2021/10/06 07:44:00 UTC

[jira] [Created] (FLINK-24459) Performance improvement of file sink on Nexmark

Alexander Trushev created FLINK-24459:
-----------------------------------------

             Summary: Performance improvement of file sink on Nexmark
                 Key: FLINK-24459
                 URL: https://issues.apache.org/jira/browse/FLINK-24459
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / FileSystem
            Reporter: Alexander Trushev
         Attachments: after.jfr.zip, after_cpu.png, after_mem.png, before.jfr.zip, before_cpu.png, before_mem.png

h3. Context

{{PartitionPathUtils.escapePathName}} is a pretty simple method that takes {{String}}, allocates {{StringBuilder}}, appends original or escaped chars, and outputs the result {{String}}.

Filesystem sink calls the method several times for each element to determine bucket id. Because of this, it is a hot spot on a workload that writes intensively to filesystem, such as [nexmark q10|https://github.com/nexmark/nexmark/blob/master/nexmark-flink/src/main/resources/queries/q10.sql]. On my local machine escaping of chars takes 9.53% CPU and 17.8% mem allocations of the whole TaskManager process.

h3. Proposal
{{PartitionPathUtils.escapePathName}} improvements
# Use more efficient {{Integer.toHexString}} instead of {{String.format}}
# Do not allocate new string when there is no escapable char in the original string
# Allocate {{StringBuilder}} depending on the original string length instead of the default value

h3. Benefit
Experiment on local machine.
1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration + object reuse option.
Before: flink-1.14.0
After: flink-1.14.0 + patch with the improvements

|| Nexmark q10 || Before || After ||
| CPU samples of escapePathName() (% of all) | 9.53 | 1.64 |
| Memory allocations by escapePathName() (% of all) | 17.8 | 2.98 |
| Throughput/Cores (K/s) | 107.64 | 119.42 |

Diff: CPU *-7.89*%, Memory *-14.82*%, Throughput *+10.9*%

Profiling reports are in the attachment.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)