You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ildipo (via GitHub)" <gi...@apache.org> on 2023/03/15 16:37:19 UTC

[GitHub] [arrow] ildipo opened a new pull request, #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

ildipo opened a new pull request, #34575:
URL: https://github.com/apache/arrow/pull/34575

   ### Rationale for this change
   
   This is the second and last block of refactoring before moving acero out of libarrow. This PR removes the remaining dependencies from libarrow into the compute/exec directory. 
   
   ### What changes are included in this PR?
   
   In detail:
   
   * compute/exec/expression is moved inside compute since it is needed in several kernels
   * all compute/exec specific include files are removed from compute/api.h
   * light_array_exec_test.cc i(introduced in the previous refactoring) s merged back into light_array_test
   * testing generators are refactored to remove the need for exec/options.h
   * some utilities are moved from compute/exec/util to compute/util
   * updated python and r libraries usage of moved files
   
   ### Are these changes tested?
   
   Yes using existing tests (some of which have been updated due to refactoring).
   
   ### Are there any user-facing changes?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470777247

   > Yes I feel what is in expression.h is used in many other places (e.g. inside kernels) so it belongs to libarrow
   
   Got it - makes sense to me. @westonpace does that sound about right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474700951

   Please create separate issues and track them in an umbrella issue to avoid having to re-open issues to avoid confusion now and during changelog creation. 
   
   I have created #34630 for this PR @ildipo  please comment "take" on it to have it assigned (automatic assignment failed as you have not interacted with the issue - a GH spam protection)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474698953

   :warning: GitHub issue #34630 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470920713

   > Got it - makes sense to me. @westonpace does that sound about right?
   
   Yes, `expression` is pretty stable.  So if the goal is a stable "compute" module then I think moving it inside `compute` should be safe.
   
   Thinking beyond just "stability" I think there is a pretty reasonable separation in Arrow between "compute functions" and "the execution engine" and I think it makes sense for `Expression` to live in the former.  There is also some desire for a Substrait API for expression evaluation.  In theory that could be done without "exec" but that would require splitting the substrait module into two pieces and I don't think that's worth our time (e.g. the expression evaluation API can just depend on "exec" even though it doesn't strictly need to)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470628004

   I also removed most of the reviewers - I think the changes to python/R binding is trivial but if @westonpace think this need review from R maintainers we can add back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on a diff in pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on code in PR #34575:
URL: https://github.com/apache/arrow/pull/34575#discussion_r1137615763


##########
cpp/src/arrow/compute/exec/plan_test.cc:
##########
@@ -1016,9 +1019,14 @@ TEST(ExecPlanExecution, ProjectMaintainsOrder) {
   constexpr int kRandomSeed = 42;
   constexpr int64_t kRowsPerBatch = 1;
   constexpr int kNumBatches = 16;
-  auto source_node = gen::Gen({{"x", gen::Step()}})
-                         ->FailOnError()
-                         ->SourceNode(kRowsPerBatch, kNumBatches);
+
+  auto generator = gen::Gen({{"x", gen::Step()}})->FailOnError();

Review Comment:
   Curious why this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on a diff in pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on code in PR #34575:
URL: https://github.com/apache/arrow/pull/34575#discussion_r1137696508


##########
cpp/src/arrow/compute/exec/plan_test.cc:
##########
@@ -1016,9 +1019,14 @@ TEST(ExecPlanExecution, ProjectMaintainsOrder) {
   constexpr int kRandomSeed = 42;
   constexpr int64_t kRowsPerBatch = 1;
   constexpr int kNumBatches = 16;
-  auto source_node = gen::Gen({{"x", gen::Step()}})
-                         ->FailOnError()
-                         ->SourceNode(kRowsPerBatch, kNumBatches);
+
+  auto generator = gen::Gen({{"x", gen::Step()}})->FailOnError();

Review Comment:
   This was using generator.h with compute/exec/options.h. I removed that include (and the associated exec options from generator, that is `SourceNode()`) and expanded it in this test, which was the only place where it was used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470750758

   I'm not sure why one check failed (apparently it was cancelled???)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1472971521

   Revision: a3c7b3834b6eeeff185ed3f7619b833fac826d69
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-b5e5c878a5](https://github.com/ursacomputing/crossbow/branches/all?query=actions-b5e5c878a5)
   
   |Task|Status|
   |----|------|
   |test-alpine-linux-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-alpine-linux-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443183411/jobs/7800199096)|
   |test-build-cpp-fuzz|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-build-cpp-fuzz)](https://github.com/ursacomputing/crossbow/actions/runs/4443183569/jobs/7800199444)|
   |test-conda-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-conda-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443184009/jobs/7800200388)|
   |test-conda-cpp-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-b5e5c878a5-azure-test-conda-cpp-valgrind)](https://github.com/ursacomputing/crossbow/runs/12068435350)|
   |test-cuda-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-cuda-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443183877/jobs/7800200072)|
   |test-debian-10-cpp-amd64|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-debian-10-cpp-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/4443182882/jobs/7800198234)|
   |test-debian-10-cpp-i386|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-debian-10-cpp-i386)](https://github.com/ursacomputing/crossbow/actions/runs/4443185068/jobs/7800202586)|
   |test-debian-11-cpp-amd64|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-debian-11-cpp-amd64)](https://github.com/ursacomputing/crossbow/actions/runs/4443183723/jobs/7800199755)|
   |test-debian-11-cpp-i386|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-debian-11-cpp-i386)](https://github.com/ursacomputing/crossbow/actions/runs/4443184683/jobs/7800201692)|
   |test-fedora-35-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-fedora-35-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443185677/jobs/7800203782)|
   |test-ubuntu-18.04-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-18.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443184851/jobs/7800202060)|
   |test-ubuntu-18.04-cpp-release|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-18.04-cpp-release)](https://github.com/ursacomputing/crossbow/actions/runs/4443184529/jobs/7800201321)|
   |test-ubuntu-18.04-cpp-static|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-18.04-cpp-static)](https://github.com/ursacomputing/crossbow/actions/runs/4443184165/jobs/7800200691)|
   |test-ubuntu-20.04-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-20.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443184944/jobs/7800202255)|
   |test-ubuntu-20.04-cpp-20|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-20.04-cpp-20)](https://github.com/ursacomputing/crossbow/actions/runs/4443183012/jobs/7800198389)|
   |test-ubuntu-20.04-cpp-bundled|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-20.04-cpp-bundled)](https://github.com/ursacomputing/crossbow/actions/runs/4443185540/jobs/7800203534)|
   |test-ubuntu-20.04-cpp-thread-sanitizer|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-20.04-cpp-thread-sanitizer)](https://github.com/ursacomputing/crossbow/actions/runs/4443185311/jobs/7800203069)|
   |test-ubuntu-22.04-cpp|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-b5e5c878a5-github-test-ubuntu-22.04-cpp)](https://github.com/ursacomputing/crossbow/actions/runs/4443185401/jobs/7800203234)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474884647

   Oh sure sorry didn’t realize that. Will certainly do next time.
   
   On Sat, Mar 18, 2023 at 11:36 AM Sutou Kouhei ***@***.***>
   wrote:
   
   > @icexelloss <https://github.com/icexelloss> Could you use
   > https://github.com/apache/arrow/blob/main/dev/merge_arrow_pr.py instead
   > of GitHub's merge button as much as possible? The script does some more
   > preparations.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/34575#issuecomment-1474880291>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAGBXLERU2KEKSNTXIDS6ADW4XI6JANCNFSM6AAAAAAV4C26LY>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474026781

   nothing else to add to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1473881311

   @westonpace @ildipo I approved. Is there anything you would like to check/change? If not I will merge this end of day today. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on a diff in pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on code in PR #34575:
URL: https://github.com/apache/arrow/pull/34575#discussion_r1138683733


##########
cpp/src/arrow/compute/type_fwd.h:
##########
@@ -50,14 +50,6 @@ struct KernelState;
 
 struct Declaration;

Review Comment:
   good point, moving in `exec/type_fwd.h`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470746388

   Yes I feel what is in expression.h is used in many other places (e.g. inside kernels) so it belongs to libarrow
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470920922

   I'll try and get a closer look at this tonight but it might be tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34575:
URL: https://github.com/apache/arrow/pull/34575#discussion_r1138101813


##########
cpp/src/arrow/compute/api.h:
##########
@@ -38,24 +38,16 @@
 /// @{
 /// @}
 
-#include "arrow/compute/exec/expression.h"  // IWYU pragma: export
-
-/// \defgroup execnode-options Concrete option classes for ExecNode options
-/// @{
-/// @}
-
-#include "arrow/compute/exec/options.h"  // IWYU pragma: export
+#include "arrow/compute/expression.h"  // IWYU pragma: export
 
 /// \defgroup execnode-row Utilities for working with data in a row-major format
 /// @{
 /// @}
 
 #include "arrow/compute/row/grouper.h"  // IWYU pragma: export
 
-/// \defgroup execnode-components Components associated with ExecNode
+/// \defgroup execnode-components Components associated with ExecBatch
 /// @{
 /// @}
 
-#include "arrow/compute/exec.h"            // IWYU pragma: export
-#include "arrow/compute/exec/exec_plan.h"  // IWYU pragma: export
-#include "arrow/compute/exec/groupby.h"    // IWYU pragma: export
+#include "arrow/compute/exec.h"  // IWYU pragma: export

Review Comment:
   This could probably be split someday (no need to do it in this PR).  `ExecContext` / `ExecSpan` / `ExecResult` / `ExecValue` / `CallFunction` / `GetFunctionExecutor` belong in compute.  `ExecBatch` belongs in `exec`.  `SelectionVector` could probably be removed at this point and added back in later if we ever decide to do something with it.



##########
cpp/src/arrow/compute/type_fwd.h:
##########
@@ -50,14 +50,6 @@ struct KernelState;
 
 struct Declaration;

Review Comment:
   I think `Declaration` is part of `exec` too.



##########
cpp/src/arrow/testing/generator.cc:
##########
@@ -373,19 +362,15 @@ class GTestDataGeneratorImpl : public GTestDataGenerator {
     EXPECT_OK_AND_ASSIGN(auto batches, target_->ExecBatches(rows_per_batch, num_batches));
     return batches;
   }
-  ::arrow::compute::Declaration SourceNode(int64_t rows_per_batch,
-                                           int num_batches) override {
-    EXPECT_OK_AND_ASSIGN(auto source_node,
-                         target_->SourceNode(rows_per_batch, num_batches));
-    return source_node;
-  }
 
   std::shared_ptr<::arrow::Table> Table(int64_t rows_per_chunk, int num_chunks) override {
     EXPECT_OK_AND_ASSIGN(auto table, target_->Table(rows_per_chunk, num_chunks));
     return table;
   }
   std::shared_ptr<::arrow::Schema> Schema() override { return target_->Schema(); }
 
+  std::shared_ptr<DataGenerator> Target() { return target_; }
+

Review Comment:
   Is this needed anywhere?



##########
cpp/src/arrow/compute/exec/plan_test.cc:
##########
@@ -1016,9 +1019,14 @@ TEST(ExecPlanExecution, ProjectMaintainsOrder) {
   constexpr int kRandomSeed = 42;
   constexpr int64_t kRowsPerBatch = 1;
   constexpr int kNumBatches = 16;
-  auto source_node = gen::Gen({{"x", gen::Step()}})
-                         ->FailOnError()
-                         ->SourceNode(kRowsPerBatch, kNumBatches);
+
+  auto generator = gen::Gen({{"x", gen::Step()}})->FailOnError();

Review Comment:
   I am planning on starting to use these utilities in more places.  However, we can introduce `exec::gen::Gen` (or something like that) which wraps the generator and adds exec-specific utilities.  So I think this is tolerable for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474027518

   just don't close the associated issue - I think github will close it automatically
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470625387

   @ildipo High level looks good to me. Is the general idea of moving "expression" from "compute/exec" to "compute" such that it is consider part of "libarrow" and not "libarrow_acero"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on a diff in pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on code in PR #34575:
URL: https://github.com/apache/arrow/pull/34575#discussion_r1138682749


##########
cpp/src/arrow/compute/api.h:
##########
@@ -38,24 +38,16 @@
 /// @{
 /// @}
 
-#include "arrow/compute/exec/expression.h"  // IWYU pragma: export
-
-/// \defgroup execnode-options Concrete option classes for ExecNode options
-/// @{
-/// @}
-
-#include "arrow/compute/exec/options.h"  // IWYU pragma: export
+#include "arrow/compute/expression.h"  // IWYU pragma: export
 
 /// \defgroup execnode-row Utilities for working with data in a row-major format
 /// @{
 /// @}
 
 #include "arrow/compute/row/grouper.h"  // IWYU pragma: export
 
-/// \defgroup execnode-components Components associated with ExecNode
+/// \defgroup execnode-components Components associated with ExecBatch
 /// @{
 /// @}
 
-#include "arrow/compute/exec.h"            // IWYU pragma: export
-#include "arrow/compute/exec/exec_plan.h"  // IWYU pragma: export
-#include "arrow/compute/exec/groupby.h"    // IWYU pragma: export
+#include "arrow/compute/exec.h"  // IWYU pragma: export

Review Comment:
   I'll fix the other things in this PR and then make another one to move ExecBatch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474698945

   * Closes: #34630


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1473749031

   The above failures seem unrelated to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1472678091

   Updated the PR with the changes detailed above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ildipo commented on a diff in pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ildipo (via GitHub)" <gi...@apache.org>.
ildipo commented on code in PR #34575:
URL: https://github.com/apache/arrow/pull/34575#discussion_r1139098658


##########
cpp/src/arrow/testing/generator.cc:
##########
@@ -373,19 +362,15 @@ class GTestDataGeneratorImpl : public GTestDataGenerator {
     EXPECT_OK_AND_ASSIGN(auto batches, target_->ExecBatches(rows_per_batch, num_batches));
     return batches;
   }
-  ::arrow::compute::Declaration SourceNode(int64_t rows_per_batch,
-                                           int num_batches) override {
-    EXPECT_OK_AND_ASSIGN(auto source_node,
-                         target_->SourceNode(rows_per_batch, num_batches));
-    return source_node;
-  }
 
   std::shared_ptr<::arrow::Table> Table(int64_t rows_per_chunk, int num_chunks) override {
     EXPECT_OK_AND_ASSIGN(auto table, target_->Table(rows_per_chunk, num_chunks));
     return table;
   }
   std::shared_ptr<::arrow::Schema> Schema() override { return target_->Schema(); }
 
+  std::shared_ptr<DataGenerator> Target() { return target_; }
+

Review Comment:
   removing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss merged pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss merged PR #34575:
URL: https://github.com/apache/arrow/pull/34575


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474995341

   Benchmark runs are scheduled for baseline = 682c112cc33b4e39ef114d718c4410a3122a872b and contender = 85245722531d5a7ee8f5b02c22b722d2ff7e7015. 85245722531d5a7ee8f5b02c22b722d2ff7e7015 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/ff39498fa6fa4718af8a3ff5c461bf70...939eaee4d7824a07891662e3c5cb565b/)
   [Failed :arrow_down:0.45% :arrow_up:0.15%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/858491d765a049d48792d05ed36fba6e...db734088f8c2424ba7a9a14d98cd6e22/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/3648a173e3d348f4ade160e55cdb9d15...b48b0daa721146319085e4472e3eb8e5/)
   [Finished :arrow_down:1.01% :arrow_up:0.06%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/33827743482240f4a0d1d8156a72ad04...3b472f2a8d7541c6a090677f8ab00b8b/)
   Buildkite builds:
   [Finished] [`85245722` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2542)
   [Finished] [`85245722` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2572)
   [Finished] [`85245722` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2540)
   [Finished] [`85245722` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2563)
   [Finished] [`682c112c` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2541)
   [Failed] [`682c112c` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2571)
   [Finished] [`682c112c` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2539)
   [Finished] [`682c112c` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2562)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474880291

   @icexelloss Could you use https://github.com/apache/arrow/blob/main/dev/merge_arrow_pr.py instead of GitHub's merge button as much as possible? The script does some more preparations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34575: GH-34630: [C++] Second block of refactoring to move acero out of libarrow

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1474995600

   ['Python', 'R'] benchmarks have high level of regressions.
   [test-mac-arm](https://conbench.ursa.dev/compare/runs/858491d765a049d48792d05ed36fba6e...db734088f8c2424ba7a9a14d98cd6e22/)
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/3648a173e3d348f4ade160e55cdb9d15...b48b0daa721146319085e4472e3eb8e5/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470372296

   * Closes: #15280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1470765981

   @ildipo Error is this
   ```
   FAILED: src/arrow/compute/CMakeFiles/arrow-compute-internals-test.dir/Unity/unity_0_cxx.cxx.obj 
   "C:\Program Files\Git\usr\bin\ccache.exe" C:\PROGRA~2\MICROS~2\2019\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe  /nologo /TP -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_HDFS -DARROW_MIMALLOC -DARROW_WITH_BENCHMARKS_REFERENCE -DARROW_WITH_LZ4 -DARROW_WITH_RE2 -DARROW_WITH_SNAPPY -DARROW_WITH_UTF8PROC -DARROW_WITH_ZLIB -DARROW_WITH_ZSTD -DBOOST_ALL_NO_LIB -DGMOCK_LINKED_AS_SHARED_LIBRARY=1 -DGTEST_LINKED_AS_SHARED_LIBRARY=1 -DURI_STATIC_BUILD -DUTF8PROC_STATIC -D_CRT_SECURE_NO_WARNINGS -D_ENABLE_EXTENDED_ALIGNED_STORAGE -ID:\a\arrow\arrow\build\cpp\src -ID:\a\arrow\arrow\cpp\src -ID:\a\arrow\arrow\cpp\src\generated -external:ID:\a\arrow\arrow\cpp\thirdparty\flatbuffers\include -external:ID:\a\arrow\arrow\build\cpp\rapidjson_ep\src\rapidjson_ep-install\include -external:ID:\a\arrow\arrow\cpp\thirdparty\hadoop\include -external:ID:\a\arrow\arrow\build\cpp\boost_ep-prefix\src\boost_
 ep -external:I"C:\Program Files\OpenSSL-Win64\include" -external:ID:\a\arrow\arrow\build\cpp\orc_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\lz4_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\zlib_ep\src\zlib_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\zstd_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\snappy_ep\src\snappy_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\protobuf_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\utf8proc_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\re2_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\xsimd_ep\src\xsimd_ep-install\include -external:ID:\a\arrow\arrow\build\cpp\mimalloc_ep\src\mimalloc_ep\include\mimalloc-2.0 -external:ID:\a\arrow\arrow\build\cpp\googletest_ep-prefix\include -external:W0 /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING /W3 /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4365 /wd4267 /wd4838 /wd4800 /wd4996 /wd4065   /MDd /Z7 /Ob0 /Od /RTC1 /WX -std:c++17
  /showIncludes /Fosrc\arrow\compute\CMakeFiles\arrow-compute-internals-test.dir\Unity\unity_0_cxx.cxx.obj /Fdsrc\arrow\compute\CMakeFiles\arrow-compute-internals-test.dir\ /FS -c D:\a\arrow\arrow\build\cpp\src\arrow\compute\CMakeFiles\arrow-compute-internals-test.dir\Unity\unity_0_cxx.cxx
   D:\a\arrow\arrow\cpp\src\arrow/compute/exec/test_util.h(49): error C2375: 'arrow::compute::ExecBatchFromJSON': redefinition; different linkage
   D:/a/arrow/arrow/cpp/src/arrow/compute/light_array_test.cc(37): note: see declaration of 'arrow::compute::ExecBatchFromJSON'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] icexelloss commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1473874320

   @westonpace 
   
   There seems to be a failure in recent CI runs (test-alpine-linux-cpp):
   ```
    68/103 Test  #60: arrow-dataset-file-json-test ..............***Failed   61.16 sec
   Running arrow-dataset-file-json-test, redirecting output into /build/cpp/build/test-logs/arrow-dataset-file-json-test.txt (attempt 1/1)
   /arrow/cpp/build-support/run-test.sh: line 88: 28113 Segmentation fault      (core dumped) 
   ```
   
   Also some valgrind failures on the following tests:
   https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=45749&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181
   
   I suspect those are not caused by this PR - do we have these failures on default branch as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on pull request #34575: GH-15280: [C++] Second block of refactoring to move acero out of libarrow

Posted by "assignUser (via GitHub)" <gi...@apache.org>.
assignUser commented on PR #34575:
URL: https://github.com/apache/arrow/pull/34575#issuecomment-1472969494

   @github-actions crossbow submit -g cpp


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org