You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Jim Apple (Code Review)" <ge...@cloudera.org> on 2016/09/06 01:36:33 UTC

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Jim Apple has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4313

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................

Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Additionally, change the parameter of rotate to a compile-time
constant, and add "inline" to functions, increasing the performance
dramatically. The compiler can't inline the SIMD versions, because
they use Intel intrinsics -- added a TODO to add these intrinsics to
sse-util.h.

Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
---
M be/src/benchmarks/int-hash-benchmark.cc
1 file changed, 241 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/4313/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4313/1/be/src/benchmarks/int-hash-benchmark.cc
File be/src/benchmarks/int-hash-benchmark.cc:

Line 271:   unique_ptr<uint32_t[]> ud(new uint32_t[DATA_SIZE]);
> Sounds good to me.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has uploaded a new patch set (#2).

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................

Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Additionally, change the parameter of rotate to a compile-time
constant, and add "inline" to functions, increasing the performance
dramatically. The compiler can't inline the SIMD versions, because
they use Intel intrinsics -- added a TODO to add these intrinsics to
sse-util.h.

Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
---
M be/src/benchmarks/int-hash-benchmark.cc
1 file changed, 245 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/4313/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 3:

Just commenting on an old patch to see if our new gerrit -> ASF mailing list setup is working.

-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4313/1/be/src/benchmarks/int-hash-benchmark.cc
File be/src/benchmarks/int-hash-benchmark.cc:

Line 41: //                                 FNV               25.7       26     26.4         1X         1X         1X
> Do we have a sense for the relative quality of the different hash functions
I am so glad you asked! :-)

We know that MultiplyShift is weaker than MultiplyAddShift, which in turn is weaker than Zobrist. The others have unknown strength.

I mean this in a formal sense. With constant probability, MultiplyShift is strong enough that the average chain in separate chaining hash tables is O(1), for instance. However, it is not known to be strong enough for O(1) linear probing.


Line 45: //                            Jenkins1               51.8       54       54      2.02X      2.08X      2.05X
> Any reason Multiple<Jenkins..> was left out? It looks competitive based on 
Only that I don't trust their strength.


Line 271:   unique_ptr<uint32_t[]> ud(new uint32_t[DATA_SIZE]);
> It'd be interesting to see how the vectorised hashing performs if the input
That would be interesting. OK with you to just add TODO for now?


-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4313/1/be/src/benchmarks/int-hash-benchmark.cc
File be/src/benchmarks/int-hash-benchmark.cc:

Line 271:   unique_ptr<uint32_t[]> ud(new uint32_t[DATA_SIZE]);
> That would be interesting. OK with you to just add TODO for now?
Sounds good to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 2: Code-Review+2

I see no reason not to merge this as-is - I assume this will be the basis for some follow-on discussions.

-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 2: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 1:

(3 comments)

Thanks, this is really interesting.

http://gerrit.cloudera.org:8080/#/c/4313/1/be/src/benchmarks/int-hash-benchmark.cc
File be/src/benchmarks/int-hash-benchmark.cc:

Line 41: //                                 FNV               25.7       26     26.4         1X         1X         1X
Do we have a sense for the relative quality of the different hash functions? It's hard to interpret the merits of them without understanding the trade-offs.


Line 45: //                            Jenkins1               51.8       54       54      2.02X      2.08X      2.05X
Any reason Multiple<Jenkins..> was left out? It looks competitive based on these numbers.


Line 271:   unique_ptr<uint32_t[]> ud(new uint32_t[DATA_SIZE]);
It'd be interesting to see how the vectorised hashing performs if the input data is non-contiguous, e.g.  ExprValuesCache.expr_values_array_ might have multiple values to hash per input row. (I guess we could also just transpose that array if we wanted to vectorise it).


-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Additionally, change the parameter of rotate to a compile-time
constant, and add "inline" to functions, increasing the performance
dramatically. The compiler can't inline the SIMD versions, because
they use Intel intrinsics -- added a TODO to add these intrinsics to
sse-util.h.

Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Reviewed-on: http://gerrit.cloudera.org:8080/4313
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/benchmarks/int-hash-benchmark.cc
1 file changed, 245 insertions(+), 57 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: Add FNV, Zobrist, and SIMD hash functions to the int hash benchmark.
......................................................................


Patch Set 2:

> I see no reason not to merge this as-is - I assume this will be the
 > basis for some follow-on discussions.

Exactly. I have some ideas on how we might change our hashing strategies, but they are not really ready yet.

-- 
To view, visit http://gerrit.cloudera.org:8080/4313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I11d48f8816d5b129858a1f773015e51049dd1d61
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No