You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2017/10/05 06:07:48 UTC

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Alex Behm has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8216


Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................

IMPALA-6011: Remove use of Guava Hasher.

Removes the use of Guava Hasher to avoid dependency
conflicts with other components that may include a
different version of Guava in their jars.

Testing:
- Locally ran planner tests and a few simple queries
  in the shell. I verified that the new code is executed.

Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
---
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
1 file changed, 19 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/8216/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8216/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8216/1//COMMIT_MSG@15
PS1, Line 15:   in the shell. I verified that the new code is executed.
It looks like lineage.test verifies the exact value of some query hashes too (wanted to convince myself that that didn't accidentally change).


http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@596
PS1, Line 596:       return "Query hash failed";
Seems like this should be a precondition or throw a runtime error or similar (it looked like Guava throws an AssertionError).


http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@601
PS1, Line 601:     // String conversion code taken from Guava's HashCode.toString().
Seems like the string conversion code should be a utility function. Alternatively it looks like commons-codec has  a function that does this: https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Hex.html#encodeHexString(byte[])

I guess part of the point was to reduce our dependency on external libraries, but I think Apache commons generally has more stable APIs.



-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 15:11:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1: Code-Review-2

-2 to prevent auto merge


-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 06:10:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Philip Zeyliger, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8216

to look at the new patch set (#2).

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................

IMPALA-6011: Remove use of Guava Hasher.

Removes the use of Guava Hasher to avoid dependency
conflicts with other components that may include a
different version of Guava in their jars.

Testing:
- Locally ran planner tests and a few simple queries
  in the shell. I verified that the new code is executed.

Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
---
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
1 file changed, 21 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/8216/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1: Code-Review+2

I'm ok with merging if you fix the exception thing (I don't think we should fail silently in that case).


-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 15:16:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1304/


-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Comment-Date: Thu, 05 Oct 2017 06:11:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Comment-Date: Thu, 05 Oct 2017 10:30:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has abandoned this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Abandoned

Guava does some weird stuff internally such that the hash it produces for the same query is different than the hash produced by the code in this patch. Probably safer not to pursue this. 

We can probably fix it, but I imagine the new code will be more complex/controversial, so I don't think it's worth the hassle

Sorry for the noise.
-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@602
PS1, Line 602:     final char[] hexDigits = "0123456789abcdef".toCharArray();
nit: I would have expected this to be a class constant 

class Vertex {
  private static final char[] hexDigits = ...;
  ...
}



-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 15:59:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 2:

Afaik these hashes are stored in Navigator and used for finding exact duplicates or something like that. I would need to confirm how these hashes are used. The safer route for now is to not change the hashes to avoid doing more damage.


-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2017 18:03:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 2:

Just getting back to this. For my own education, does it matter if the output of the hash function changes? I.e. does the lineage graph depend on the same query text always hashing to the same thing or does it just need to hash to a unique thing?


-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Oct 2017 16:20:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8216/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8216/1//COMMIT_MSG@15
PS1, Line 15:   in the shell. I verified that the new code is executed.
> lineage.test is a planner test. It succeeds because we don't compare the qu
So we don't have any testing that the query hash is a sensible value? I don't think you broke it but seems like a test coverage gap.


http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@601
PS1, Line 601:     // String conversion code taken from Guava's HashCode.toString().
> I bet we are pulling in many versions of Apache commons as well. The API ma
I'm still wondering if this should be a separate utility function.

It seems ok to have our own version of this but in general that argument about dependencies would justify never using external libraries, which doesn't seem right.



-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 17:47:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8216/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8216/1//COMMIT_MSG@15
PS1, Line 15:   in the shell. I verified that the new code is executed.
> It looks like lineage.test verifies the exact value of some query hashes to
lineage.test is a planner test. It succeeds because we don't compare the query hash, see ColumnLineageGraph.equals().


http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@596
PS1, Line 596:       return "Query hash failed";
> Seems like this should be a precondition or throw a runtime error or simila
Done


http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@601
PS1, Line 601:     // String conversion code taken from Guava's HashCode.toString().
> Seems like the string conversion code should be a utility function. Alterna
I bet we are pulling in many versions of Apache commons as well. The API may be more stable, but no idea about the behavior. I prefer this version which seems most predictable.


http://gerrit.cloudera.org:8080/#/c/8216/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@602
PS1, Line 602:     final char[] hexDigits = "0123456789abcdef".toCharArray();
> nit: I would have expected this to be a class constant 
Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 17:36:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6011: Remove use of Guava Hasher.

Posted by "Philip Zeyliger (Code Review)" <ge...@cloudera.org>.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8216 )

Change subject: IMPALA-6011: Remove use of Guava Hasher.
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@318
PS2, Line 318:   private final char[] hexDigits_ = "0123456789abcdef".toCharArray();
You can make this static and call it HEX_DIGITS to mark it as a constant?


http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@593
PS2, Line 593:   private String getQueryHash(String queryStr) {
It's a bit pedantic, but it'd be nice to write a test that executes this code. Ideally, we'd also confirm that some query hashes to the same result as it used to with the Guava implementation.


http://gerrit.cloudera.org:8080/#/c/8216/2/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@599
PS2, Line 599:       throw new IllegalStateException("Query hash failed");
typically this would be "throw new IllegalStateException("Query hash failed", e)"

I think that chains the exceptions, which is handy since we'll want to know what e.getMessage() says.

(This will never actually happen, but it's more idiomatic I think.)



-- 
To view, visit http://gerrit.cloudera.org:8080/8216
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I04b8888b436eb93a86b8db24b2a97f14047f828b
Gerrit-Change-Number: 8216
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 17:48:23 +0000
Gerrit-HasComments: Yes