You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Attila Bukor (Code Review)" <ge...@cloudera.org> on 2019/07/26 14:35:14 UTC

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Hello Adar Dembo,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/13928

to review the following change.


Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 118 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13928/5/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/5/src/kudu/common/partial_row.h@292
PS5, Line 292:   ///@{
This needs to be terminated somewhere, no?

Also, is Doxygen okay with the nested braces (here and L315)? Could you build the docs and check?


http://gerrit.cloudera.org:8080/#/c/13928/5/src/kudu/common/partial_row.h@352
PS5, Line 352:   /// basic validation that the data is not larger than max_column_length*4 as
Also needs to be updated.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 13 Aug 2019 18:44:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@14
PS3, Line 14: to already be truncated (which it is in Impala's case) and only check
> is this a safe assumption? last I was aware, Impala's treatment of "string"
More context on bytes vs. characters (from part 1 of this series): https://gerrit.cloudera.org/c/13760/20/src/kudu/common/schema.h#126).



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sun, 28 Jul 2019 21:43:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#13).

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................

KUDU-1938 Add non-copy VARCHAR setters pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 88 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/13
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#6).

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 130 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#8).

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 128 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/8
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#11).

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................

KUDU-1938 Add non-copy VARCHAR setters pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 88 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/11
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@14
PS3, Line 14: to already be truncated (which it is in Impala's case) and only check
is this a safe assumption? last I was aware, Impala's treatment of "string" is actually not UTF8, so their CHAR(8) is 8 bytes, not 8 unicode characters. Based on the rest of this commit message it sounds like we treat CHAR(8) as 8 unicode characters, which might be more than 8 bytes


http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@17
PS3, Line 17: to avoid having to count each character manually.
is the unicode character counting not already fast-pathed for the ASCII subset of utf8? it seems like that should be a pretty easy optimization. It's still O(n) but probably can be several bytes per cycle (eg load 8 butes and & with 0x8080808080808080 to check for high bits)



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 26 Jul 2019 22:52:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 20: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 20
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 22 Oct 2019 13:25:35 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#2).

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 128 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@297
PS1, Line 297:  used. This is
             :   /// subject to change in the
> The doxygen comments start with:
sorry, I missed that one. Should I just change the name to Advanced/Unstable API and push down the current title to the description? Should I leave the current description as well as an explanation?


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@307
PS1, Line 307: 
> When I read "Unsafe" I tend to think of memory accesses, or of APIs that pl
These methods shouldn't be used by clients other than Impala so that part of the 'unsafe' contract is there. It doesn't cause memory though but clients could abuse it and add entries with 4 * max_column_length characters if only ASCII or 'traditional' code pages are use which is why I thought it should be unsafe.


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc
File src/kudu/common/partial_row.cc:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc@423
PS1, Line 423: Status KuduPartialRow::SetCharNoCopyUnsafe(int col_idx, const Slice& val) {
> Hmm, I guess I can see that. Could you doc this limitation in the CHAR doxy
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Jul 2019 22:43:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 13: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 01 Oct 2019 19:05:54 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................

KUDU-1938 Add non-copy VARCHAR setters pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Reviewed-on: http://gerrit.cloudera.org:8080/13928
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Grant Henke <gr...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 88 insertions(+), 0 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Grant Henke: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 24
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#23).

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................

KUDU-1938 Add non-copy VARCHAR setters pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 88 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/23
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 23
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 23: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 23
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 22 Oct 2019 22:09:17 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@296
PS1, Line 296: 
> Maybe clearer as "...not larger than the maximum column length (as indicate
Done


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@297
PS1, Line 297: 
             :   /// These methods expect the
> I think the former makes more sense. Basically, make it consistent with how
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 13 Aug 2019 10:20:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h@306
PS19, Line 306:   Status SetVarcharNoCopyUnsafe(const Slice& col_name, const Slice& val) WARN_UNUSED_RESULT;
> yep we talked about it with Adar here: https://gerrit.cloudera.org/c/13928/
Sorry I missed that. 

Is the validation expensive enough that we really want to skip it? I understand avoiding the copy, but are we opening our selves up to bugs in the future without validation here?



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 02 Oct 2019 15:46:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 4:

(2 comments)

Could you bug Todd to respond to your comments?

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@296
PS1, Line 296: max_column
> Done
Maybe clearer as "...not larger than the maximum column length (as indicated by the schema) multiplied by 4, as that's..."


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@297
PS1, Line 297:  used. They also
             :   /// don't check for trailing
> sorry, I missed that one. Should I just change the name to Advanced/Unstabl
I think the former makes more sense. Basically, make it consistent with how KuduScanner::SetRowFormatFlags is Doxygen-commented.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 13 Aug 2019 06:30:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@274
PS1, Line 274:   /// @note The specified data must remain valid until the corresponding
             :   ///   RPC calls are completed to be able to access error buffers,
             :   ///   if any errors happened (the errors can be fetched using the
             :   ///   KuduSession::GetPendingErrors() method).
> Should this @note be copied in?
Done


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@296
PS1, Line 296: max_column
> It's not obvious what max_length is here. Could you clarify?
Done


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@297
PS1, Line 297:  used. This is
             :   /// subject to change in the
> See KuduScanner::SetRowFormatFlags for a more formal way to specify this.
I'm not sure what you mean, I haven't found anything that would mean the behavior would be subject to change there.


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@307
PS1, Line 307: 
> Curious why you chose to add the Unsafe suffix? Are these fundamentally dif
I wanted to indicate that this is not for 'normal' as it doesn't guarantee the behavior one might expect (no truncation). Should I remove it or do you have a better way to indicate this?


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc
File src/kudu/common/partial_row.cc:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc@423
PS1, Line 423: Status KuduPartialRow::SetCharNoCopyUnsafe(int col_idx, const Slice& val) {
> Should we validate that there's no trailing whitespace in the CHAR method? 
Well, it would mean checking one character only which shouldn't be too expensive, but it seems kind of arbitrary and weird to not check for actual length but still check whether the last character is a whitespace or not. Let me know what you think.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Jul 2019 20:02:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h@306
PS19, Line 306:   Status SetVarcharNoCopyUnsafe(const Slice& col_name, const Slice& val) WARN_UNUSED_RESULT;
> Is there a reason these add "Unsafe" to the end while the other NoCopy meth
yep we talked about it with Adar here: https://gerrit.cloudera.org/c/13928/1/src/kudu/common/partial_row.h#307
let me know if you have a better idea, I'm happy to change it



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 02 Oct 2019 15:23:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#4).

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 128 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#3).

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 128 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h@306
PS19, Line 306:   Status SetVarcharNoCopyUnsafe(const Slice& col_name, const Slice& val) WARN_UNUSED_RESULT;
Is there a reason these add "Unsafe" to the end while the other NoCopy methods don't?



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 02 Oct 2019 14:05:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13928/5/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/5/src/kudu/common/partial_row.h@292
PS5, Line 292:   ///@{
> This needs to be terminated somewhere, no?
Actually it seems to work (just tested), but I left the @{ in L315 by mistake, removing it now.


http://gerrit.cloudera.org:8080/#/c/13928/5/src/kudu/common/partial_row.h@352
PS5, Line 352:   /// basic validation that the data is not larger than max_column_length*4 as
> Also needs to be updated.
Done. Problem is that Doxygen seems to group the methods by @name and only one of the docs are used if @names match for all methods with the @name, so I had to put the setters part in the @name. Hope the way I did is okay, let me know if I should change it.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 13 Aug 2019 21:35:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h@306
PS19, Line 306:   Status SetVarcharNoCopyUnsafe(const Slice& col_name, const Slice& val) WARN_UNUSED_RESULT;
> Sorry I missed that. 
Ideally no-one would use this to actually persist data, Impala only uses it to decide which bucket a given primary key falls to and by then the data is pre-truncated. The check is there only to make sure no one uses this method to bypass the length limitation of the VARCHAR to persist huge amounts of data.

Even assuming only ASCII characters would be used and with the fast path that is coming in https://gerrit.cloudera.org/c/14353 we need to load the whole value and perform bitwise operations on it chunk by chunk. This seems to me like a bigger performance hit than a simple memcpy which we still optimized out in SetStringNoCopy and SetBinaryNoCopy.

The potential bugs are why I adde the "Unsafe" suffix, hopefully this would deter users from using this version.

Todd also mentioned doing a proper check here instead in https://gerrit.cloudera.org/c/13928/3//COMMIT_MSG#17, but as I said to him I'm happy to change it if you think we should suffer the performance hit rather than allow abusing the API that would potentially lead to bugs.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 03 Oct 2019 07:11:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#12).

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................

KUDU-1938 Add non-copy VARCHAR setters pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/client/predicate-test.cc
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
M src/kudu/integration-tests/varchar-itest.cc
5 files changed, 91 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/12
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@297
PS1, Line 297:  used. This is
             :   /// subject to change in the
> I'm not sure what you mean, I haven't found anything that would mean the be
The doxygen comments start with:

  /// @name Advanced/Unstable API
  //
  ///@{

And then end with:

  ///@}


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@307
PS1, Line 307: 
> I wanted to indicate that this is not for 'normal' as it doesn't guarantee 
When I read "Unsafe" I tend to think of memory accesses, or of APIs that plain aren't safe to use and should be avoided.

Maybe NoTruncate would be more appropriate here? That's not perfect either though because it doesn't describe the lax length checking.

Maybe just leave Unsafe then.


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc
File src/kudu/common/partial_row.cc:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc@423
PS1, Line 423: Status KuduPartialRow::SetCharNoCopyUnsafe(int col_idx, const Slice& val) {
> Well, it would mean checking one character only which shouldn't be too expe
Hmm, I guess I can see that. Could you doc this limitation in the CHAR doxygen comment?



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 2
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Jul 2019 20:40:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#10).

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................

KUDU-1938 Add non-copy VARCHAR setters pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 88 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/10
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 23: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 23
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 22 Oct 2019 23:16:55 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/19/src/kudu/common/partial_row.h@306
PS19, Line 306:   Status SetVarcharNoCopyUnsafe(const Slice& col_name, const Slice& val) WARN_UNUSED_RESULT;
> Ideally no-one would use this to actually persist data, Impala only uses it
I don't have a strong feeling either way about this I guess. We can always add the check later should it be needed.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 04 Oct 2019 16:44:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 19: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 19
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 04 Oct 2019 16:44:43 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 21: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 21
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 22 Oct 2019 16:59:32 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h
File src/kudu/common/partial_row.h:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@274
PS1, Line 274:   /// @note The specified data must remain valid until the corresponding
             :   ///   RPC calls are completed to be able to access error buffers,
             :   ///   if any errors happened (the errors can be fetched using the
             :   ///   KuduSession::GetPendingErrors() method).
Should this @note be copied in?


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@296
PS1, Line 296: max_length
It's not obvious what max_length is here. Could you clarify?


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@297
PS1, Line 297: This is subject
             :   /// to change in the future.
See KuduScanner::SetRowFormatFlags for a more formal way to specify this.


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.h@307
PS1, Line 307: Unsafe
Curious why you chose to add the Unsafe suffix? Are these fundamentally different than the existing NoCopy functions?


http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc
File src/kudu/common/partial_row.cc:

http://gerrit.cloudera.org:8080/#/c/13928/1/src/kudu/common/partial_row.cc@423
PS1, Line 423: Status KuduPartialRow::SetCharNoCopyUnsafe(int col_idx, const Slice& val) {
Should we validate that there's no trailing whitespace in the CHAR method? Or is that too expensive?



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Jul 2019 18:17:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@14
PS3, Line 14: to already be truncated (which it is in Impala's case) and only check
> More context on bytes vs. characters (from part 1 of this series): https://
We do treat it as UTF8 characters, I didn't know Impala treated them differently. For some additional context, I did some testing, Oracle treats lengths based as either bytes or characters based on the NLS_LENGTH_SEMANTICS parameter, but it can be explicitly set in table creation time (e.g. CHAR(10 CHAR) or CHAR(10 BYTE)). MySQL is supposed to treat them as characters in recent versions, but in my version at least I couldn't make it work. Postgres treats them as characters.

Anyway, if Impala treats them as bytes, it won't fail here as the length will be always <= than the allowed max length.


http://gerrit.cloudera.org:8080/#/c/13928/3//COMMIT_MSG@17
PS3, Line 17: to avoid having to count each character manually.
> is the unicode character counting not already fast-pathed for the ASCII sub
We could do this, but & 0x8080808080808080 would only tell us whether it has UTF8 characters and it would still need to check the whole string. In this case I decided to sacrifice absolute correctness for performance, let me know if I should still check properly.

The & 0x8080808080808080 is a good idea to optimize where I actually count the characters and thinking about it gave me an idea to optimize it a bit better even, I'll do some perf tests and submit another patch with the optimizations later if it makes sense.



-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 3
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 29 Jul 2019 16:02:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13928

to look at the new patch set (#5).

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................

KUDU-1938 Add non-copy setters to partial row pt 3

Apache Impala uses KuduPartialRow API to determine which partition a row
will be inserted to distribute the data between executors optimally.

For this purpose the copy is unnecessary and it should be fast. This
commit adds NoCopyUnsafe variants for this purpose which expect the data
to already be truncated (which it is in Impala's case) and only check
that the value's length is lower than the highest possible upper bound:
val.size() < max_length*4 bytes (the maximum size of an UTF8 character)
to avoid having to count each character manually.

Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
---
M src/kudu/common/partial_row-test.cc
M src/kudu/common/partial_row.cc
M src/kudu/common/partial_row.h
3 files changed, 132 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/28/13928/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 5
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1938 Add non-copy setters to partial row pt 3

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy setters to partial row pt 3
......................................................................


Patch Set 6: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 6
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 13 Aug 2019 22:01:20 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1938 Add non-copy VARCHAR setters pt 3

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13928 )

Change subject: KUDU-1938 Add non-copy VARCHAR setters pt 3
......................................................................


Patch Set 21: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13928
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f2aba098d649eb94e0314f6606cc33600e8d766
Gerrit-Change-Number: 13928
Gerrit-PatchSet: 21
Gerrit-Owner: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 22 Oct 2019 16:20:07 +0000
Gerrit-HasComments: No