You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/16 21:26:59 UTC

[GitHub] [arrow] wesm commented on a change in pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

wesm commented on a change in pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#discussion_r441146931



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -30,6 +31,21 @@ namespace internal {
 
 namespace {
 
+// Code units in the range [a-z] can only be an encoding of an ascii
+// character/codepoint, not the 2nd, 3rd or 4th code unit (byte) of an different
+// codepoint. This guaranteed by non-overlap design of the unicode standard. (see
+// section 2.5 of Unicode Standard Core Specification v13.0)
+
+uint8_t ascii_tolower(uint8_t utf8_code_unit) {

Review comment:
       I think you will want this to be `static inline` (not sure all compilers will inline it otherwise)

##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -159,11 +282,23 @@ void MakeUnaryStringBatchKernel(std::string name, ArrayKernelExec exec,
   DCHECK_OK(registry->AddFunction(std::move(func)));
 }
 
+template <template <typename> typename Transformer>

Review comment:
       I think some compilers demand that "class" be used for the last "typename"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org