You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/04 14:22:00 UTC
[jira] [Commented] (FLINK-9435) Remove per-key selection Tuple
instantiation via reflection in ComparableKeySelector and ArrayKeySelector
[ https://issues.apache.org/jira/browse/FLINK-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500281#comment-16500281 ]
ASF GitHub Bot commented on FLINK-9435:
---------------------------------------
GitHub user NicoK opened a pull request:
https://github.com/apache/flink/pull/6115
[FLINK-9435][java] Remove per-key selection Tuple instantiation via reflection in ComparableKeySelector and ArrayKeySelector
## What is the purpose of the change
Inside `KeySelectorUtil`, every `ComparableKeySelector#getKey()` call currently creates a new tuple from `Tuple.getTupleClass(keyLength).newInstance();` which seems expensive. Instead, we could get a template tuple and use `Tuple#copy()` which copies the right sub-class in a more optimal way.
Similarly, `ArrayKeySelector` instantiates new Tuple instances via reflection (albeit caching the required tuple class for the returned key) which can be changed the same way.
With the micro-benchmarks added for a very simple job basically only doing a `keyBy()` (https://github.com/dataArtisans/flink-benchmarks/pull/5), I get these results:
```
Benchmark Mode Cnt Score Error Units
------------- old ---------------
KeyByBenchmarks.arrayKeyBy thrpt 9 1055.706 ± 170.221 ops/ms
KeyByBenchmarks.tupleKeyBy thrpt 9 1537.923 ± 271.665 ops/ms
------------- new ---------------
KeyByBenchmarks.arrayKeyBy thrpt 9 1213.073 ± 39.672 ops/ms
KeyByBenchmarks.tupleKeyBy thrpt 9 1848.172 ± 188.013 ops/ms
```
That is roughly 15% more for the `ArrayKeySelector` and 20% more for the `ComparableKeySelector`.
## Brief change log
- optimise `ComparableKeySelector` using a template `Tuple` instance determined once
- optimise `ArrayKeySelector` using a template `Tuple` instance determined once
## Verifying this change
This change is already covered by existing tests.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): **no**
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
- The serializers: **no**
- The runtime per-record code paths (performance sensitive): **yes**
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
- The S3 file system connector: **no**
## Documentation
- Does this pull request introduce a new feature? **no**
- If yes, how is the feature documented? **not applicable**
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NicoK/flink flink-9435
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6115.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6115
----
commit b0c6944b861f57ade038c492141febf8fa9b7502
Author: Nico Kruber <ni...@...>
Date: 2018-05-24T22:09:37Z
[FLINK-9435][java] optimise ComparableKeySelector for more efficient Tuple creation
commit 77a8349a7c51ec96bf49eee841b0c05acb1815f6
Author: Nico Kruber <ni...@...>
Date: 2018-06-04T11:30:57Z
[FLINK-9435][java] optimise ArrayKeySelector for more efficient Tuple creation
----
> Remove per-key selection Tuple instantiation via reflection in ComparableKeySelector and ArrayKeySelector
> ---------------------------------------------------------------------------------------------------------
>
> Key: FLINK-9435
> URL: https://issues.apache.org/jira/browse/FLINK-9435
> Project: Flink
> Issue Type: Improvement
> Components: Streaming
> Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.3.2, 1.3.3, 1.5.0, 1.4.1, 1.4.2, 1.6.0
> Reporter: Nico Kruber
> Assignee: Nico Kruber
> Priority: Major
>
> Inside {{KeySelectorUtil}}, every {{ComparableKeySelector#getKey()}} call currently creates a new tuple from {{Tuple.getTupleClass(keyLength).newInstance();}} which seems expensive. Instead, we could get a template tuple and use {{Tuple#copy()}} which copies the right sub-class in a more optimal way.
> Similarly, {{ArrayKeySelector}} instantiates new {{Tuple}} instances via reflection which can be changed the same way.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)