You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/04 18:10:29 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #8621: ARROW-9128: [C++] Implement string space trimming kernels: trim, ltrim, and rtrim

pitrou commented on a change in pull request #8621:
URL: https://github.com/apache/arrow/pull/8621#discussion_r551437108



##########
File path: cpp/src/arrow/util/utf8.h
##########
@@ -456,6 +456,67 @@ static inline bool UTF8Transform(const uint8_t* first, const uint8_t* last,
   return true;
 }
 
+template <class Predicate>
+static inline bool UTF8FindIf(const uint8_t* first, const uint8_t* last,
+                              Predicate&& predicate, const uint8_t** position) {
+  const uint8_t* i = first;
+  while (i < last) {
+    uint32_t codepoint = 0;
+    const uint8_t* current = i;
+    if (ARROW_PREDICT_FALSE(!UTF8Decode(&i, &codepoint))) {
+      return false;
+    }
+    if (predicate(codepoint)) {
+      *position = current;
+      return true;
+    }
+  }
+  *position = last;
+  return true;
+}
+
+// same semantics as std::find_if using reverse iterators when the return value
+// having the same semantics as std::reverse_iterator<..>.base()
+template <class Predicate>
+static inline bool UTF8FindIfReverse(const uint8_t* first, const uint8_t* last,
+                                     Predicate&& predicate, const uint8_t** position) {
+  const uint8_t* i = last - 1;
+  while (i >= first) {
+    uint32_t codepoint = 0;
+    const uint8_t* current = i;
+    if (ARROW_PREDICT_FALSE(!UTF8DecodeReverse(&i, &codepoint))) {
+      return false;
+    }
+    if (predicate(codepoint)) {
+      *position = current + 1;

Review comment:
       This is a bit weird. It returns the position to the next codepoint? The docstring should be a bit clearer about that (the current spelling is cryptic to me).

##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -186,6 +172,40 @@ struct UTF8Transform {
   }
 };
 
+#ifdef ARROW_WITH_UTF8PROC
+
+template <typename Type, typename Derived>
+struct UTF8Transform : StringTransform<Type, Derived> {

Review comment:
       I don't exactly understand this refactor. There's a `UTF8Transform` with a `Transform` method for utf8 kernels but no corresponding class with a `Transform` method for ascii kernels, is that right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org