You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/30 11:44:54 UTC

[GitHub] [arrow] edponce commented on a change in pull request #10869: ARROW-12714: [C++] String title case kernel

edponce commented on a change in pull request #10869:
URL: https://github.com/apache/arrow/pull/10869#discussion_r698416337



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -527,6 +517,55 @@ struct Utf8CapitalizeTransform : public StringTransformBase {
 template <typename Type>
 using Utf8Capitalize = StringTransformExec<Type, Utf8CapitalizeTransform>;
 
+struct Utf8TitleTransform : public StringTransformCodepointBase {
+  int64_t Transform(const uint8_t* input, int64_t input_string_ncodeunits,
+                    uint8_t* output) {
+    uint8_t* output_start = output;
+    const uint8_t* end = input + input_string_ncodeunits;
+    const uint8_t* curr = NULLPTR;
+    uint32_t codepoint = 0;
+
+    do {
+      // Uppercase first alpha character of current word
+      while (input < end) {
+        curr = input;
+        if (ARROW_PREDICT_FALSE(!util::UTF8Decode(&curr, &codepoint))) {
+          return kTransformError;
+        }
+        if (IsCasedCharacterUnicode(codepoint)) {
+          output =
+              util::UTF8Encode(output, UTF8UpperTransform::TransformCodepoint(codepoint));
+          input = curr;
+          break;
+        }
+        output = std::copy(input, curr, output);
+        input = curr;
+      }
+
+      // Lowercase characters until a whitespace is found

Review comment:
       Thanks for catching this. I do have the incorrect semantics of what titlecase is. Lesson learned: Always verify multiple languages/sources for semantics of compute functions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org