You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/02/13 19:11:28 UTC

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5980

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................

IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

The bug: The DisjointSet maintains a set of unique item sets
using a HashSet<Set<T>>. The problem is that we modified
the Set<T> elements after inserting them into the HashSet.
This caused the removal of elements from the HashSet to fail.
Removal is required for maintaining a consistent DisjointSet.
The removal could even fail for the same Set<T> instance because
the hashCode() changed from when the Set<T> was originally
inserted to when the removal was attempted due to mutation
of the Set<T>.
An inconsistent DisjointSet can lead to incorrect equivalence
classes, which can lead to missing, redundant and even
non-executable predicates. Incorrect results and crashes are
possible.

For most queries, an inconsistent DisjointSet does not alter
the equivalence classes, and even fewer queries have incorrect
plans.
In fact, many of our existing planner tests trigger this bug,
but only 3 of them lead to an incorrect value transfer graph,
and all 3 had correct plans.

The fix: Added a new ReferenceHashSet that stores a set of
unique references. It does not rely on the hashCode() and
equals() of the stored elements, so the same object can be
added and later removed, even when mutated in the meantime.
The DisjointSet maintains the set of item sets using this
new ReferenceHashSet.

Testing:
- Added a Preconditions check in DisjointSet that asserts
  correct removal of an item set. Many of our existing tests
  hit the check before this fix.
- Added a new unit test for DisjointSet which triggers
  the bug.
- Augmented DisjointSet.checkConsistency() to check for
  inconsistency in the set of item sets.
- Added validation of the value-transfer graph in
  single-node planner tests.
- A private core/hdfs run succeeded.

Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/util/DisjointSet.java
A fe/src/main/java/org/apache/impala/util/ReferenceHashSet.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/TestDisjointSet.java
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
7 files changed, 127 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/5980/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

The bug: The DisjointSet maintains a set of unique item sets
using a HashSet<Set<T>>. The problem is that we modified
the Set<T> elements after inserting them into the HashSet.
This caused the removal of elements from the HashSet to fail.
Removal is required for maintaining a consistent DisjointSet.
The removal could even fail for the same Set<T> instance because
the hashCode() changed from when the Set<T> was originally
inserted to when the removal was attempted due to mutation
of the Set<T>.
An inconsistent DisjointSet can lead to incorrect equivalence
classes, which can lead to missing, redundant and even
non-executable predicates. Incorrect results and crashes are
possible.

For most queries, an inconsistent DisjointSet does not alter
the equivalence classes, and even fewer queries have incorrect
plans.
In fact, many of our existing planner tests trigger this bug,
but only 3 of them lead to an incorrect value transfer graph,
and all 3 had correct plans.

The fix: Use an IdentityHashMap to store the set of item sets.
It does not rely on the hashCode() and equals() of the stored
elements, so the same object can be added and later removed,
even when mutated in the meantime.

Testing:
- Added a Preconditions check in DisjointSet that asserts
  correct removal of an item set. Many of our existing tests
  hit the check before this fix.
- Added a new unit test for DisjointSet which triggers
  the bug.
- Augmented DisjointSet.checkConsistency() to check for
  inconsistency in the set of item sets.
- Added validation of the value-transfer graph in
  single-node planner tests.
- A private core/hdfs run succeeded.

Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Reviewed-on: http://gerrit.cloudera.org:8080/5980
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/util/DisjointSet.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/TestDisjointSet.java
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
6 files changed, 65 insertions(+), 19 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 3: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5980/2/fe/src/main/java/org/apache/impala/util/DisjointSet.java
File fe/src/main/java/org/apache/impala/util/DisjointSet.java:

PS2, Line 49:  private final IdentityHashMap<Set<T>, Object> uniqueSets_ =
            :       new IdentityHashMap<Set<T>, Object>();
> nice idea :)
It was pretty much your idea :)


PS2, Line 111: checkState(removedValue == DUMMY_VALUE);
> I think checking for not NULL is also ok.
This one seems more explicit because if we got back a non-NULL value that is not DUMMY_VALUE, then something bad has happened.


-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 2: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5980/2/fe/src/main/java/org/apache/impala/util/DisjointSet.java
File fe/src/main/java/org/apache/impala/util/DisjointSet.java:

PS2, Line 49:  private final IdentityHashMap<Set<T>, Object> uniqueSets_ =
            :       new IdentityHashMap<Set<T>, Object>();
nice idea :)


PS2, Line 111: checkState(removedValue == DUMMY_VALUE);
I think checking for not NULL is also ok.


-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 3:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/275/

-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Dimitris Tsirogiannis (Code Review)" <ge...@cloudera.org>.
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5980/1/fe/src/main/java/org/apache/impala/common/Id.java
File fe/src/main/java/org/apache/impala/common/Id.java:

PS1, Line 40: 
haha. That was useless since the hashCode() for Integer is simply its value.


http://gerrit.cloudera.org:8080/#/c/5980/1/fe/src/main/java/org/apache/impala/util/DisjointSet.java
File fe/src/main/java/org/apache/impala/util/DisjointSet.java:

PS1, Line 41: conventional
object-quality based? maybe make it more explicit what part of the HashSet is causing the problem.


http://gerrit.cloudera.org:8080/#/c/5980/1/fe/src/main/java/org/apache/impala/util/ReferenceHashSet.java
File fe/src/main/java/org/apache/impala/util/ReferenceHashSet.java:

PS1, Line 28: that
remove


PS1, Line 29: for storing elements that are expected to be mutated after
            :  * insertion into this set
Another way to put it is "when reference-equality is needed instead of object-equality". See description of IdentityHashMap for ideas.


PS1, Line 55: private final class Wrapper<T> 
Add a brief comment.


PS1, Line 64: @SuppressWarnings("unchecked")
Maybe you want to add a comment on why the cast is safe and therefore  it is ok to suppress the compiler warning.


-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................

IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

The bug: The DisjointSet maintains a set of unique item sets
using a HashSet<Set<T>>. The problem is that we modified
the Set<T> elements after inserting them into the HashSet.
This caused the removal of elements from the HashSet to fail.
Removal is required for maintaining a consistent DisjointSet.
The removal could even fail for the same Set<T> instance because
the hashCode() changed from when the Set<T> was originally
inserted to when the removal was attempted due to mutation
of the Set<T>.
An inconsistent DisjointSet can lead to incorrect equivalence
classes, which can lead to missing, redundant and even
non-executable predicates. Incorrect results and crashes are
possible.

For most queries, an inconsistent DisjointSet does not alter
the equivalence classes, and even fewer queries have incorrect
plans.
In fact, many of our existing planner tests trigger this bug,
but only 3 of them lead to an incorrect value transfer graph,
and all 3 had correct plans.

The fix: Use an IdentityHashMap to store the set of item sets.
It does not rely on the hashCode() and equals() of the stored
elements, so the same object can be added and later removed,
even when mutated in the meantime.

Testing:
- Added a Preconditions check in DisjointSet that asserts
  correct removal of an item set. Many of our existing tests
  hit the check before this fix.
- Added a new unit test for DisjointSet which triggers
  the bug.
- Augmented DisjointSet.checkConsistency() to check for
  inconsistency in the set of item sets.
- Added validation of the value-transfer graph in
  single-node planner tests.
- A private core/hdfs run succeeded.

Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/common/Id.java
M fe/src/main/java/org/apache/impala/util/DisjointSet.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/TestDisjointSet.java
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
6 files changed, 65 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/5980/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>

[Impala-ASF-CR] IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-4916: Fix maintenance of set of item sets in DisjointSet.
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5980/1/fe/src/main/java/org/apache/impala/common/Id.java
File fe/src/main/java/org/apache/impala/common/Id.java:

PS1, Line 40: 
> haha. That was useless since the hashCode() for Integer is simply its value
Yup.


http://gerrit.cloudera.org:8080/#/c/5980/1/fe/src/main/java/org/apache/impala/util/DisjointSet.java
File fe/src/main/java/org/apache/impala/util/DisjointSet.java:

PS1, Line 41: conventional
> object-quality based? maybe make it more explicit what part of the HashSet 
Expanded explanation why this does not work with HashSet.


http://gerrit.cloudera.org:8080/#/c/5980/1/fe/src/main/java/org/apache/impala/util/ReferenceHashSet.java
File fe/src/main/java/org/apache/impala/util/ReferenceHashSet.java:

PS1, Line 28: that
> remove
Removed this class.


PS1, Line 29: for storing elements that are expected to be mutated after
            :  * insertion into this set
> Another way to put it is "when reference-equality is needed instead of obje
I did not know about IdentityHashMap. I think we should get rid of this new class entirely.


PS1, Line 55: private final class Wrapper<T> 
> Add a brief comment.
Removed this class.


PS1, Line 64: @SuppressWarnings("unchecked")
> Maybe you want to add a comment on why the cast is safe and therefore  it i
Removed this class.


-- 
To view, visit http://gerrit.cloudera.org:8080/5980
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I609c8795c09becd78815605ea8e82e2f99e82212
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dt...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-HasComments: Yes