You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/09 17:24:49 UTC

[GitHub] [arrow] pitrou opened a new pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

pitrou opened a new pull request #10496:
URL: https://github.com/apache/arrow/pull/10496


   Factor out type-agnostic string operations (such as finding a split pattern)
   in separate classes to avoid generating several versions of them when
   generating the typed kernel execution classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #10496:
URL: https://github.com/apache/arrow/pull/10496#discussion_r648536998



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2539,167 +2581,136 @@ struct TrimStateUTF8 {
   }
 };
 
-template <typename Type, bool left, bool right, typename Derived>
-struct UTF8TrimBase : StringTransform<Type, Derived> {
-  using Base = StringTransform<Type, Derived>;
-  using offset_type = typename Base::offset_type;
-  using State = KernelStateFromFunctionOptions<TrimStateUTF8, TrimOptions>;
-  TrimStateUTF8 state_;
+template <bool TrimLeft, bool TrimRight>
+struct UTF8TrimTransform : public StringTransformBase {
+  using State = KernelStateFromFunctionOptions<UTF8TrimState, TrimOptions>;
 
-  explicit UTF8TrimBase(TrimStateUTF8 state) : state_(std::move(state)) {}
+  const UTF8TrimState& state_;
 
-  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    TrimStateUTF8 state = State::Get(ctx);
-    RETURN_NOT_OK(state.status_);
-    return Derived(state).Execute(ctx, batch, out);
-  }
+  explicit UTF8TrimTransform(const UTF8TrimState& state) : state_(state) {}
 
-  Status Execute(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    EnsureLookupTablesFilled();
-    return Base::Execute(ctx, batch, out);
+  Status PreExec(KernelContext* ctx, const ExecBatch& batch, Datum* out) override {
+    return state_.status_;

Review comment:
       Should we call EnsureLookupTablesFilled here?

##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2539,167 +2581,136 @@ struct TrimStateUTF8 {
   }
 };
 
-template <typename Type, bool left, bool right, typename Derived>
-struct UTF8TrimBase : StringTransform<Type, Derived> {
-  using Base = StringTransform<Type, Derived>;
-  using offset_type = typename Base::offset_type;
-  using State = KernelStateFromFunctionOptions<TrimStateUTF8, TrimOptions>;
-  TrimStateUTF8 state_;
+template <bool TrimLeft, bool TrimRight>
+struct UTF8TrimTransform : public StringTransformBase {
+  using State = KernelStateFromFunctionOptions<UTF8TrimState, TrimOptions>;
 
-  explicit UTF8TrimBase(TrimStateUTF8 state) : state_(std::move(state)) {}
+  const UTF8TrimState& state_;
 
-  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    TrimStateUTF8 state = State::Get(ctx);
-    RETURN_NOT_OK(state.status_);
-    return Derived(state).Execute(ctx, batch, out);
-  }
+  explicit UTF8TrimTransform(const UTF8TrimState& state) : state_(state) {}
 
-  Status Execute(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    EnsureLookupTablesFilled();
-    return Base::Execute(ctx, batch, out);
+  Status PreExec(KernelContext* ctx, const ExecBatch& batch, Datum* out) override {
+    return state_.status_;

Review comment:
       Actually it looks like UTF8FindIf doesn't need the tables after all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #10496:
URL: https://github.com/apache/arrow/pull/10496#discussion_r648543624



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2539,167 +2581,136 @@ struct TrimStateUTF8 {
   }
 };
 
-template <typename Type, bool left, bool right, typename Derived>
-struct UTF8TrimBase : StringTransform<Type, Derived> {
-  using Base = StringTransform<Type, Derived>;
-  using offset_type = typename Base::offset_type;
-  using State = KernelStateFromFunctionOptions<TrimStateUTF8, TrimOptions>;
-  TrimStateUTF8 state_;
+template <bool TrimLeft, bool TrimRight>
+struct UTF8TrimTransform : public StringTransformBase {
+  using State = KernelStateFromFunctionOptions<UTF8TrimState, TrimOptions>;
 
-  explicit UTF8TrimBase(TrimStateUTF8 state) : state_(std::move(state)) {}
+  const UTF8TrimState& state_;
 
-  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    TrimStateUTF8 state = State::Get(ctx);
-    RETURN_NOT_OK(state.status_);
-    return Derived(state).Execute(ctx, batch, out);
-  }
+  explicit UTF8TrimTransform(const UTF8TrimState& state) : state_(state) {}
 
-  Status Execute(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
-    EnsureLookupTablesFilled();
-    return Base::Execute(ctx, batch, out);
+  Status PreExec(KernelContext* ctx, const ExecBatch& batch, Datum* out) override {
+    return state_.status_;

Review comment:
       No, it's only when looking up unicode categories (whitespace, etc.).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10496:
URL: https://github.com/apache/arrow/pull/10496#issuecomment-857917015


   https://issues.apache.org/jira/browse/ARROW-12951


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10496:
URL: https://github.com/apache/arrow/pull/10496#issuecomment-857889228


   This reduces the code size of `compute/kernels/scalar_string.cc.o` by about 5% (in release mode). Not a terrific improvement, but a worthwhile cleanup IMHO.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #10496:
URL: https://github.com/apache/arrow/pull/10496


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org