You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/13 21:44:22 UTC

[GitHub] [arrow] nirandaperera commented on a change in pull request #10317: ARROW-12713 [C++] String reverse kernel

nirandaperera commented on a change in pull request #10317:
URL: https://github.com/apache/arrow/pull/10317#discussion_r632123654



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -266,6 +266,52 @@ void EnsureLookupTablesFilled() {}
 
 #endif  // ARROW_WITH_UTF8PROC
 
+template <typename Type>
+struct AsciiReverse : StringTransform<Type, AsciiReverse<Type>> {
+  using Base = StringTransform<Type, AsciiReverse<Type>>;
+  using offset_type = typename Base::offset_type;
+
+  bool Transform(const uint8_t* input, offset_type input_string_ncodeunits,
+                 uint8_t* output, offset_type* output_written) {
+    uint8_t utf8_char_found = 0;
+    for (offset_type i = 0; i < input_string_ncodeunits; i++) {
+      // if a utf8 char is found, report to utf8_char_found
+      utf8_char_found |= input[i] & 0x80;

Review comment:
       IMO `ascii_reverse` is slightly different from the other `ascii_*` routines. If there are non-ascii chars present, they can bypass them (ex: lower, upper, etc), but in reverse, we can't do that. Therefore, I added a flag `utf8_char_found` inside the loop, and if any utf8 char is found, the `ascii_reverse` would throw an invalid status. 
   I am not sure, if this check is needed or not. If we can assume that the user passes an 'ascii-only' input, we can simply user `std::reverse_copy`. So, I'd like to get a second opinion on that. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org