You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/22 15:15:43 UTC

[GitHub] [arrow] lidavidm commented on a diff in pull request #13679: ARROW-17115: [C++] HashJoin fails if it encounters a batch with more than 32Ki rows

lidavidm commented on code in PR #13679:
URL: https://github.com/apache/arrow/pull/13679#discussion_r927750234


##########
cpp/src/arrow/compute/exec/exec_plan.h:
##########
@@ -144,10 +146,24 @@ class ARROW_EXPORT ExecPlan : public std::enable_shared_from_this<ExecPlan> {
   /// \brief Return the plan's attached metadata
   std::shared_ptr<const KeyValueMetadata> metadata() const;
 
+  /// \brief Should the plan use a legacy batching strategy
+  ///
+  /// This is currently in place only to support the Scanner::ToTable
+  /// method.  This method relies on batch indices from the scanner
+  /// remaining consistent.  This is impractical in the ExecPlan which
+  /// might slice batches as needed (e.g. for a join)
+  ///
+  /// However, it still works for simple plans and this is the only way
+  /// we have at the moment for maintaining implicit order.
+  bool UseLegacyBatching() const { return use_legacy_batching_; }

Review Comment:
   I'm not super enthused about this flag, but it seems it'll be hard to get around it. Is there a plan for being able to get rid of it? 
   
   Also, ScanBatches is the underlying method that requires sequencing right?



##########
cpp/src/arrow/compute/exec/exec_plan.h:
##########
@@ -144,10 +146,24 @@ class ARROW_EXPORT ExecPlan : public std::enable_shared_from_this<ExecPlan> {
   /// \brief Return the plan's attached metadata
   std::shared_ptr<const KeyValueMetadata> metadata() const;
 
+  /// \brief Should the plan use a legacy batching strategy
+  ///
+  /// This is currently in place only to support the Scanner::ToTable
+  /// method.  This method relies on batch indices from the scanner
+  /// remaining consistent.  This is impractical in the ExecPlan which
+  /// might slice batches as needed (e.g. for a join)
+  ///
+  /// However, it still works for simple plans and this is the only way
+  /// we have at the moment for maintaining implicit order.
+  bool UseLegacyBatching() const { return use_legacy_batching_; }

Review Comment:
   In principle is it not possible to renumber the batches? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org