You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Attila Bukor (Code Review)" <ge...@cloudera.org> on 2019/10/02 23:24:42 UTC

[kudu-CR] [WIP] KUDU-1938 Performance tuning pt 2

Hello Adar Dembo, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/14354

to review the following change.


Change subject: [WIP] KUDU-1938 Performance tuning pt 2
......................................................................

[WIP] KUDU-1938 Performance tuning pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks by doubling the
chunk size in a single pass from 64 to 128 bits.

Attilas-MacBook-Pro:debug charger$ ./bin/char_util-test
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (1 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (4496 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (650 ms)
[----------] 4 tests from CharUtilTest (5148 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (5149 ms total)
[  PASSED  ] 4 tests.

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util.cc
1 file changed, 9 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 14: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> Out of curiosity I checked what happens if we can use AVX instructions, her
Well, let's not lose sight of the broader context: this functionality micro-optimizes writing VARCHAR data to Kudu. Given that we'd expect most folks to use STRING and not VARCHAR, perhaps we should timebox this endeavour. Personally I'd optimize for UTF-8 over ASCII but it probably doesn't matter much either way.


http://gerrit.cloudera.org:8080/#/c/14354/14/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS14: 
New indentation doesn't look right; the bodies of the if/else-if shouldn't align with the condition, they should align with two-chars-after-the-brace.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2019 06:55:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#11).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (676 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 28 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/11
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> It'll be really hard to provide guidance on when to enable that optimizatio
yeah maybe allow setting it on SetSliceCopy? I tried having a fallback fast path the way you proposed as well, but it was unfortunately slower in both cases:

8 and 16 byte fast paths (aligned):

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (11317 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (777 ms)

I believe this is due to the extra branches which I suspect was the case with my previous attempt at making the slow path faster (instead of counting everything that's not the first byte of an UTF8 character, check the first ones only and advance the pointer to the start of the next character based on the first byte).

Interestingly enough I tried again now without aligning to 8/16 bytes and it was faster this way even though aligned seemed better with the non-SSE instructions:

8 byte fast path only with (unaligned):

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7485 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1160 ms)

16 byte fast path only (unaligned):

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (11150 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (754 ms)

It seems unaligned two-lane fast path wins over all other cases though:

8 and 16 byte fast paths (unaligned):

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (708 ms)


http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc
File src/kudu/util/char_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc@97
PS11, Line 97: TEST_F(CharUtilTest, CorrectnessTestUtf8AndAscii) {
             :   Slice result;
             :   Slice data = "ááááááááááááááááááááááááááááááááaaaaaaaaaaaaaaaaaaaaaaaaaaaa
> Can you do it in one line?
Done


http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS9: 
> Bear in mind that we already have CheckCPUFlags() to verify that the CPU su
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2019 12:29:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#15).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (708 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 37 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/15
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
> Looks like this got slower? Why?
I think it's due to the fact that this way we can only fast runs of 16 ASCII characters instead of 8 characters. I was wondering if I should make ASCII optimization optional via a flag. What do you think?


http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc
File src/kudu/util/char_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc@97
PS11, Line 97:   Slice data;
             : 
             :   data = "ááááááááááááááááááááááááááááááááaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
> Combine?
I'm not sure what you mean.


http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS9: 
> Think you missed this.
I'm still thinking how to do this exactly. This will be used by the clients instead of the server-side and AFAIK we don't have any CPU restrictions there - if someone wants to write to Kudu through the C++ client from a Raspberry Pi or some embedded system, we should support it as best as we reasonably can.

So I was thinking maybe putting the whole thing behind #if __SSE4_1__ so that we don't even compile the part if the target -march doesn't support it and then do a has_sse41 check just in case and if it doesn't, throw an exception?

Also, if this is not part of the Kudu server targets/binaries and it can be easily disabled with setting the appropriate -march, do we still need to limit ourselves to SSE4.2 or can we use AVX* behind #if?



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2019 08:31:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-1938 Performance tuning pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#2).

Change subject: [WIP] KUDU-1938 Performance tuning pt 2
......................................................................

[WIP] KUDU-1938 Performance tuning pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Attilas-MacBook-Pro:debug charger$ ./bin/char_util-test
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (2 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (4395 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (626 ms)
[----------] 4 tests from CharUtilTest (5024 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (5024 ms total)
[  PASSED  ] 4 tests.

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util.cc
1 file changed, 15 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [WIP] KUDU-1938 Performance tuning pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#3).

Change subject: [WIP] KUDU-1938 Performance tuning pt 2
......................................................................

[WIP] KUDU-1938 Performance tuning pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Also optimizes the way non-fast-pathed chunks are processed.

Attilas-MacBook-Pro:debug charger$ ninja char_util-test && ./bin/char_util-test
[7/7] Linking CXX executable bin/char_util-test
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (5 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (3173 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (659 ms)
[----------] 4 tests from CharUtilTest (3838 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (3838 ms total)
[  PASSED  ] 4 tests.

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util.cc
1 file changed, 40 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> I don't think the improvements (and scope) here justify requiring AVX in Ku
I checked that and that was about AVX2 which was addded in Haswell in 2013. I'm talking about "plain old" AVX which was first added to Sandy Bridge in 2011. Anyway, I'm fine with not adding it here, but do you think it's not worth a discussion on dev@ in general? Also, I'm not advocating adding it as a requirement, only to add it as an optional flag which would be disabled by default for now.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2019 22:22:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#8).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Also optimizes the way slow-path chunks are processed.

Before:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (2 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (0 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (3235 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (290 ms)
[----------] 4 tests from CharUtilTest (3527 ms total)

After:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (2 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (2713 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (226 ms)
[----------] 4 tests from CharUtilTest (2942 ms total)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util.cc
1 file changed, 41 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> yeah maybe allow setting it on SetSliceCopy? I tried having a fallback fast
Out of curiosity I checked what happens if we can use AVX instructions, here are the results:

8, 16 and 32 byte fast paths (unaligned):

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (11544 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (399 ms)

I was wondering if I should leave the AVX stuff in behind #ifdef __AVX__ and maybe start a discussion on dev@ about supporting AVX with an optional cmake flag.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2019 14:28:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#10).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (676 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 28 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#16).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (708 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 36 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/16
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> Well, let's not lose sight of the broader context: this functionality micro
Yep definitely don't want to spend too much more time optimizing this, but also didn't want to leave it too slow. Replacing SSE4.1 with AVX instead of adding it as yet another branch has roughly the same speed-up on the fast path with a smaller overhead to the slow path btw:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (10111 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (412 ms)

Anyway, do you think it's worth starting a discussion on dev@ about AVX in general? AVX has been present since Sandy Bridge (introduced in 2011) on virtually all x86 CPUs and the SSE4.2 dependency rules out non-x86 CPUs anyway.


http://gerrit.cloudera.org:8080/#/c/14354/14/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS14: 
> New indentation doesn't look right; the bodies of the if/else-if shouldn't 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2019 13:28:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> I checked that and that was about AVX2 which was addded in Haswell in 2013.
Ah, sorry for conflating AVX with AVX2.

I'd be supportive of such a discussion if it's important to you. I don't feel strongly because I'm not aware of any other CPU-based optimizations that are currently on hold because of our poor handling of "new" instruction sets. Bear in mind that addressing this correctly probably means changes to both the Kudu and thirdparty builds, and we'll need to make sure we separate the build machine's capabilities from that of the target machine.

So I guess what I'm saying is we should only go down this path if there are meaningful rewards at the end of it.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2019 22:34:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 16: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 04 Dec 2019 00:05:10 +0000
Gerrit-HasComments: No

[kudu-CR] [WIP] KUDU-1938 Performance tuning pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#4).

Change subject: [WIP] KUDU-1938 Performance tuning pt 2
......................................................................

[WIP] KUDU-1938 Performance tuning pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Also optimizes the way non-fast-pathed chunks are processed.

Before:

Attilas-MacBook-Pro:debug charger$ ninja char_util-test && ./bin/char_util-test
[8/8] Linking CXX executable bin/char_util-test
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (1 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (3568 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (902 ms)
[----------] 4 tests from CharUtilTest (4472 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (4472 ms total)
[  PASSED  ] 4 tests.

After:

Attilas-MacBook-Pro:debug charger$ ninja char_util-test && ./bin/char_util-test
[7/7] Linking CXX executable bin/char_util-test
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (5 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (3173 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (659 ms)
[----------] 4 tests from CharUtilTest (3838 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (3838 ms total)
[  PASSED  ] 4 tests.

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util.cc
1 file changed, 40 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [WIP] KUDU-1938 Performance tuning pt 2

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: [WIP] KUDU-1938 Performance tuning pt 2
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14354/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/3//COMMIT_MSG@16
PS3, Line 16: [7/7] Linking CXX executable bin/char_util-test
What was the performance improvement in this case?



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 03 Oct 2019 01:39:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-1938 Performance tuning pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: [WIP] KUDU-1938 Performance tuning pt 2
......................................................................


Patch Set 5:

(1 comment)

> Patch Set 3:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/14354/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/3//COMMIT_MSG@16
PS3, Line 16: 
> What was the performance improvement in this case?
sorry, added the "before" times.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Thu, 03 Oct 2019 06:59:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14354/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/9//COMMIT_MSG@30
PS9, Line 30: After:
RELEASE mode?


http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS9: 
We should ensure that these instructions are present on the CPU in our machine. Take a look at util/init.cc for some inspiration (or existing code).



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 05 Oct 2019 01:01:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#13).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (708 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 37 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/13
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
Looks like this got slower? Why?


http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc
File src/kudu/util/char_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc@97
PS11, Line 97:   Slice data;
             : 
             :   data = "ááááááááááááááááááááááááááááááááaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
Combine?


http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS9: 
> We should ensure that these instructions are present on the CPU in our mach
Think you missed this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2019 06:19:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#14).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (708 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 39 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/14
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 15: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
> Yep definitely don't want to spend too much more time optimizing this, but 
I don't think the improvements (and scope) here justify requiring AVX in Kudu.

It's worth pointing out that as recently as 2016 we were being careful not to require AVX, and more recently, we even had a contributor who made it so we don't assume that running Linux means using AVX. Look at the git history of thirdparty/build-definitions.sh to see what I mean.



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 13 Nov 2019 18:13:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (9285 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (708 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Reviewed-on: http://gerrit.cloudera.org:8080/14354
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 36 insertions(+), 4 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#6).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Also optimizes the way slow-path chunks are processed.

Before:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (2 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (0 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (3235 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (290 ms)
[----------] 4 tests from CharUtilTest (3527 ms total)

After:

[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CharUtilTest
[ RUN      ] CharUtilTest.CorrectnessTestUtf8
[       OK ] CharUtilTest.CorrectnessTestUtf8 (2 ms)
[ RUN      ] CharUtilTest.CorrectnessTestAscii
[       OK ] CharUtilTest.CorrectnessTestAscii (1 ms)
[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (2713 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (226 ms)
[----------] 4 tests from CharUtilTest (2942 ms total)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util.cc
1 file changed, 41 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14354

to look at the new patch set (#12).

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................

KUDU-1938 Make UTF-8 truncation faster pt 2

Adds Intel Intrinsics (up to SSE4.2) to speed up the processing of UTF8
character counting in the case of ASCII-only chunks (fast path) by
doubling the chunk size in a single pass from 64 to 128 bits.

Before:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (7746 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (1028 ms)

After:

[ RUN      ] CharUtilTest.StressTestUtf8
[       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
[ RUN      ] CharUtilTest.StressTestAscii
[       OK ] CharUtilTest.StressTestAscii (676 ms)

Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
---
M src/kudu/util/char_util-test.cc
M src/kudu/util/char_util.cc
2 files changed, 28 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/14354/12
-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Make UTF-8 truncation faster pt 2

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14354 )

Change subject: KUDU-1938 Make UTF-8 truncation faster pt 2
......................................................................


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14354/11//COMMIT_MSG@22
PS11, Line 22: [ RUN      ] CharUtilTest.StressTestUtf8
             : [       OK ] CharUtilTest.StressTestUtf8 (10599 ms)
> I think it's due to the fact that this way we can only fast runs of 16 ASCI
It'll be really hard to provide guidance on when to enable that optimization. And if the code runs in the client, it'll have to be configurable via function call rather than a flag.

What about reusing the existing code too? Something like this:

  if next 16 bytes are all ASCII:
    use 16 byte fast path
  else if next 8 bytes are all ASCII:
    use original 8 byte fast path
  else:
    use one byte slow path


http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc
File src/kudu/util/char_util-test.cc:

http://gerrit.cloudera.org:8080/#/c/14354/11/src/kudu/util/char_util-test.cc@97
PS11, Line 97:   Slice data;
             : 
             :   data = "ááááááááááááááááááááááááááááááááaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
> I'm not sure what you mean.
Can you do it in one line?

  Slice data = "...";


http://gerrit.cloudera.org:8080/#/c/14354/9/src/kudu/util/char_util.cc
File src/kudu/util/char_util.cc:

PS9: 
> I'm still thinking how to do this exactly. This will be used by the clients
Bear in mind that we already have CheckCPUFlags() to verify that the CPU supports SSE 4.2 instructions. It's called in the servers as well as in the C++ client.

So this really just boils down to ensuring we don't run this code in tests on old CPUs. Maybe it'd be good enough to call CheckCPUFlags() in KuduTest::KuduTest or KuduTest::Setup?



-- 
To view, visit http://gerrit.cloudera.org:8080/14354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a491157dd5c8b4815030bbda921a0afc0bafd28
Gerrit-Change-Number: 14354
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 12 Nov 2019 08:49:34 +0000
Gerrit-HasComments: Yes