You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2017/10/30 19:56:05 UTC

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8417


Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................

IMPALA-2281: Replace FNV with FastHash in exchange nodes

FNV is not a good enough hash function. This patch introduces FastHash
into the codebase and uses it in exchange nodes.

Testing: Two test cases involving arbitrary ordering are changed.
Single node performance benchmark shows no performance difference.

Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/util/hash-util-ir.cc
M be/src/util/hash-util.h
M testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
M testdata/workloads/tpcds/queries/tpcds-q77a.test
13 files changed, 194 insertions(+), 88 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8417/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 1
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................

IMPALA-2281: Replace FNV with FastHash in exchange nodes

FNV is not a good enough hash function. This patch introduces FastHash
into the codebase and uses it in exchange nodes.

Testing: Two test cases involving arbitrary ordering are changed.
Single node performance benchmark shows no performance difference.

Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/util/hash-util-ir.cc
M be/src/util/hash-util.h
M testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
M testdata/workloads/tpcds/queries/tpcds-q77a.test
13 files changed, 201 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8417/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/runtime/raw-value-test.cc
File be/src/runtime/raw-value-test.cc:

http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/runtime/raw-value-test.cc@104
PS5, Line 104: FastHash
> "FastHash"
Done


http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/runtime/raw-value.h@74
PS5, Line 74:   /// Get a 64-bit hash value using the FastHash function.
> Please add a reference. https://www.google.com/search?q=fasthash doesn't di
Done


http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/util/hash-util.h@32
PS5, Line 32:  public:
> Please add the name of this file in the appropriate place in LICENSE.txt. Y
Done. I followed the practice in be/src/kudu/security/init.cc and assumed the license here should be removed.



-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 01:14:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................

IMPALA-2281: Replace FNV with FastHash in exchange nodes

FNV is not a good enough hash function. This patch introduces FastHash
into the codebase and uses it in exchange nodes.

Testing: Two test cases involving arbitrary ordering are changed.
Single node performance benchmark shows no performance difference.

Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Reviewed-on: http://gerrit.cloudera.org:8080/8417
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Impala Public Jenkins
---
M LICENSE.txt
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/util/hash-util-ir.cc
M be/src/util/hash-util.h
M testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
13 files changed, 209 insertions(+), 103 deletions(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 7
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/8417/5/be/src/util/hash-util.h@32
PS5, Line 32:  public:
> Done. I followed the practice in be/src/kudu/security/init.cc and assumed t
I'm not 100% on that, since http://www.apache.org/legal/src-headers.html#3party says, "Do not modify or remove any copyright notices or licenses within third-party works," but if Kudu does it, it's probably the right thing to do.



-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 04:17:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................

IMPALA-2281: Replace FNV with FastHash in exchange nodes

FNV is not a good enough hash function. This patch introduces FastHash
into the codebase and uses it in exchange nodes.

Testing: Two test cases involving arbitrary ordering are changed.
Single node performance benchmark shows no performance difference.

Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/util/hash-util-ir.cc
M be/src/util/hash-util.h
M testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
M testdata/workloads/tpcds/queries/tpcds-q77a.test
13 files changed, 201 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8417/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 4
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 07:39:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................


Patch Set 2:

(12 comments)

Did a first pass over it.

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/codegen/gen_ir_descriptions.py
File be/src/codegen/gen_ir_descriptions.py:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/codegen/gen_ir_descriptions.py@99
PS2, Line 99:   ["HASH_FAST", "IrFastHash"],
Do we use this? I don't see any places where it's used currently.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc@a468
PS2, Line 468: 
Lol, weird.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/data-stream-sender.cc@474
PS2, Line 474: partition_expr_evals_[j
Use 'eval' directly.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/runtime/raw-value.cc@220
PS2, Line 220:       return HashUtil::FastHash64(v, static_cast<size_t>(type.GetByteSize()), seed);
I think this might be slightly slower - with the previous approach I think we get a specialised version of the hash function for that input length for each data type, whereas here we have to go through another switch in GetByteSize().


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@235
PS2, Line 235:   /* The MIT License
I think this should go at the top of the file beneath the apache license. Then we can just say that the FastHash64 implementation came with that license. E.g.

/* 
  FastHash64 implementation derived from MIT-licensed code written by Zilong Tan

  The MIT License
  ...
*/


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@263
PS2, Line 263: size_t
use int64_t - we generally use signed integers for lengths.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@266
PS2, Line 266: (const uint64_t *)
We use c-style casts - i.e. reinterpret_cast<>


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@267
PS2, Line 267: pos
I believe the C++ standard doesn't allow pointer arithmetic on void - we should convert to uint8_t for doing that.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@278
PS2, Line 278:  (const unsigned char*
See above comments.


http://gerrit.cloudera.org:8080/#/c/8417/2/be/src/util/hash-util.h@282
PS2, Line 282: (uint64_t)
should use static_cast<uint64_t>()


http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
File testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test:

http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test@145
PS2, Line 145: 8,'k1',-1,'k1',1
Why does this make a difference when we have VERIFY_IS_EQUAL_SORTED above?


http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/tpcds/queries/tpcds-q77a.test
File testdata/workloads/tpcds/queries/tpcds-q77a.test:

http://gerrit.cloudera.org:8080/#/c/8417/2/testdata/workloads/tpcds/queries/tpcds-q77a.test@125
PS2, Line 125: 'catalog channel',NULL,538912.55,2050279.74,-1383554.73
I'll file a JIRA to make this test deterministically pass: IMPALA-6155. In the meantime, can you change this to

   ---- RESULTS: VERIFY_IS_EQUAL_SORTED

That will make it ignore the order of results.



-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 2
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Nov 2017 00:28:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1446/


-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 04:17:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2281: Replace FNV with FastHash in exchange nodes

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/8417 )

Change subject: IMPALA-2281: Replace FNV with FastHash in exchange nodes
......................................................................

IMPALA-2281: Replace FNV with FastHash in exchange nodes

FNV is not a good enough hash function. This patch introduces FastHash
into the codebase and uses it in exchange nodes.

Testing: Two test cases involving arbitrary ordering are changed.
Single node performance benchmark shows no performance difference.

Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
---
M LICENSE.txt
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/runtime/data-stream-sender.cc
M be/src/runtime/data-stream-sender.h
M be/src/runtime/data-stream-test.cc
M be/src/runtime/raw-value-test.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/util/hash-util-ir.cc
M be/src/util/hash-util.h
M testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
13 files changed, 209 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/8417/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8417
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I778317d982dcdb94173a369a65b39f32b4f7ded2
Gerrit-Change-Number: 8417
Gerrit-PatchSet: 6
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>