You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by zentol <gi...@git.apache.org> on 2018/05/07 08:09:20 UTC
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
GitHub user zentol opened a pull request:
https://github.com/apache/flink/pull/5959
[FLINK-9258][metrics] Thread-safe initialization of variables map
## What is the purpose of the change
This PR fixes a thread-safety issue in `ComponentMetricGroup` where a map was assigned to the `variables` field in a synchronized block before it was populated.
This meant that a modification could be made to the map that is already visible to other threads, leading to `ConcurrentModificationExceptions`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zentol/flink 9258
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/5959.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #5959
----
----
---
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:
https://github.com/apache/flink/pull/5959#discussion_r190165058
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java ---
@@ -57,11 +57,12 @@ public ComponentMetricGroup(MetricRegistry registry, String[] scope, P parent) {
if (variables == null) { // avoid synchronization for common case
synchronized (this) {
if (variables == null) {
- variables = new HashMap<>();
- putVariables(variables);
+ Map<String, String> tmpVariables = new HashMap<>();
+ putVariables(tmpVariables);
if (parent != null) { // not true for Job-/TaskManagerMetricGroup
- variables.putAll(parent.getAllVariables());
+ tmpVariables.putAll(parent.getAllVariables());
}
+ variables = tmpVariables;
--- End diff --
This looks potentially still broken to me because it does double-checked locking, which is a defective anti-pattern.
---
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:
https://github.com/apache/flink/pull/5959#discussion_r190168697
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java ---
@@ -57,11 +57,12 @@ public ComponentMetricGroup(MetricRegistry registry, String[] scope, P parent) {
if (variables == null) { // avoid synchronization for common case
synchronized (this) {
if (variables == null) {
- variables = new HashMap<>();
- putVariables(variables);
+ Map<String, String> tmpVariables = new HashMap<>();
+ putVariables(tmpVariables);
if (parent != null) { // not true for Job-/TaskManagerMetricGroup
- variables.putAll(parent.getAllVariables());
+ tmpVariables.putAll(parent.getAllVariables());
}
+ variables = tmpVariables;
--- End diff --
EDIT removed my previous comment about double-checked locking because I saw that the variable is `volatile`
---
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
Posted by zentol <gi...@git.apache.org>.
Github user zentol closed the pull request at:
https://github.com/apache/flink/pull/5959
---
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on a diff in the pull request:
https://github.com/apache/flink/pull/5959#discussion_r190165634
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java ---
@@ -57,11 +57,12 @@ public ComponentMetricGroup(MetricRegistry registry, String[] scope, P parent) {
if (variables == null) { // avoid synchronization for common case
synchronized (this) {
if (variables == null) {
- variables = new HashMap<>();
- putVariables(variables);
+ Map<String, String> tmpVariables = new HashMap<>();
+ putVariables(tmpVariables);
if (parent != null) { // not true for Job-/TaskManagerMetricGroup
- variables.putAll(parent.getAllVariables());
+ tmpVariables.putAll(parent.getAllVariables());
}
+ variables = tmpVariables;
--- End diff --
To be clear, this is not exactly about the fixed lines, but the line that is commented with "optimization for common case" is the remaining problem.
---
[GitHub] flink issue #5959: [FLINK-9258][metrics] Thread-safe initialization of varia...
Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/5959
Sounds good. After the mentioned changes this looks ready to merge for me 👍
---
[GitHub] flink issue #5959: [FLINK-9258][metrics] Thread-safe initialization of varia...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/5959
merging.
---
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:
https://github.com/apache/flink/pull/5959#discussion_r186630665
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java ---
@@ -57,11 +57,12 @@ public ComponentMetricGroup(MetricRegistry registry, String[] scope, P parent) {
if (variables == null) { // avoid synchronization for common case
synchronized (this) {
if (variables == null) {
- variables = new HashMap<>();
- putVariables(variables);
+ Map<String, String> tmpVariables = new HashMap<>();
+ putVariables(tmpVariables);
if (parent != null) { // not true for Job-/TaskManagerMetricGroup
- variables.putAll(parent.getAllVariables());
+ tmpVariables.putAll(parent.getAllVariables());
}
+ variables = tmpVariables;
--- End diff --
We _could_ write a test, but it would be heavily tied to the implementation and is thus a bit questionable.
---
[GitHub] flink pull request #5959: [FLINK-9258][metrics] Thread-safe initialization o...
Posted by yuqi1129 <gi...@git.apache.org>.
Github user yuqi1129 commented on a diff in the pull request:
https://github.com/apache/flink/pull/5959#discussion_r186603684
--- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/ComponentMetricGroup.java ---
@@ -57,11 +57,12 @@ public ComponentMetricGroup(MetricRegistry registry, String[] scope, P parent) {
if (variables == null) { // avoid synchronization for common case
synchronized (this) {
if (variables == null) {
- variables = new HashMap<>();
- putVariables(variables);
+ Map<String, String> tmpVariables = new HashMap<>();
+ putVariables(tmpVariables);
if (parent != null) { // not true for Job-/TaskManagerMetricGroup
- variables.putAll(parent.getAllVariables());
+ tmpVariables.putAll(parent.getAllVariables());
}
+ variables = tmpVariables;
--- End diff --
Do not need any test that verify this change ?
---
[GitHub] flink issue #5959: [FLINK-9258][metrics] Thread-safe initialization of varia...
Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/5959
I have two quick questions:
- Why does `ComponentMetricGroup` even override the method `getAllVariables` from `AbstractMetricGroup` with essentially the exact same code?
-Why is it only fixed in `ComponentMetricGroup`? Could it make sense to fix it in `AbstractMetricGroup` and remove the overriding method in the subclass?
---
[GitHub] flink issue #5959: [FLINK-9258][metrics] Thread-safe initialization of varia...
Posted by StefanRRichter <gi...@git.apache.org>.
Github user StefanRRichter commented on the issue:
https://github.com/apache/flink/pull/5959
LGTM 👍
---
[GitHub] flink issue #5959: [FLINK-9258][metrics] Thread-safe initialization of varia...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/5959
The methods have been identical since FLINK-7692, but I didn't catch it in the review. Thus, `ComponentMetricGroup#getAllVariables()` should be removed, along with `ComponentMetricGroup#putVariables()`, and the fix applied to `AbstractMetricGroup#getAllVariables()`.
We try to construct as many things as possible lazily, to reduce the resource impact in case the metric system isn't used.
---