You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/24 12:50:11 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #9972: ARROW-12315: [R] add max_partitions argument to write_dataset()

lidavidm commented on a change in pull request #9972:
URL: https://github.com/apache/arrow/pull/9972#discussion_r637918633



##########
File path: cpp/src/arrow/dataset/file_base.cc
##########
@@ -450,6 +450,12 @@ Status WriteNextBatch(WriteState& state, const std::shared_ptr<Fragment>& fragme
   ARROW_ASSIGN_OR_RAISE(auto groups, state.write_options.partitioning->Partition(batch));
   batch.reset();  // drop to hopefully conserve memory
 
+  if (static_cast<int>(state.write_options.max_partitions) < static_cast<int>(0)) {

Review comment:
       There's no need to cast here since max_partitions is already an int. (The other place had to cast since one side was an int and the other side was size_t.)

##########
File path: cpp/src/arrow/dataset/file_base.cc
##########
@@ -450,6 +450,12 @@ Status WriteNextBatch(WriteState& state, const std::shared_ptr<Fragment>& fragme
   ARROW_ASSIGN_OR_RAISE(auto groups, state.write_options.partitioning->Partition(batch));
   batch.reset();  // drop to hopefully conserve memory
 
+  if (static_cast<int>(state.write_options.max_partitions) < static_cast<int>(0)) {
+    return Status::Invalid("A number of ", state.write_options.max_partitions,
+                           " partitions is not feasible.",
+                           " Try with max_partitions = 200, 100 or any other positive integer.");

Review comment:
       ```suggestion
       return Status::Invalid("max_partitions must be positive (was ",
                              state.write_options.max_partitions, ")");
   ```

##########
File path: cpp/src/arrow/dataset/file_base.cc
##########
@@ -450,6 +450,12 @@ Status WriteNextBatch(WriteState& state, const std::shared_ptr<Fragment>& fragme
   ARROW_ASSIGN_OR_RAISE(auto groups, state.write_options.partitioning->Partition(batch));
   batch.reset();  // drop to hopefully conserve memory
 
+  if (static_cast<int>(state.write_options.max_partitions) < static_cast<int>(0)) {

Review comment:
       ```suggestion
     if (state.write_options.max_partitions <= 0) {
   ```

##########
File path: cpp/src/arrow/dataset/file_base.cc
##########
@@ -450,6 +450,12 @@ Status WriteNextBatch(WriteState& state, const std::shared_ptr<Fragment>& fragme
   ARROW_ASSIGN_OR_RAISE(auto groups, state.write_options.partitioning->Partition(batch));
   batch.reset();  // drop to hopefully conserve memory
 
+  if (static_cast<int>(state.write_options.max_partitions) < static_cast<int>(0)) {

Review comment:
       Also, I'd probably put this validation in WriteInternal() below so it gets checked immediately.

##########
File path: r/R/dataset-write.R
##########
@@ -40,6 +40,8 @@
 #' will yield `"part-0.feather", ...`.
 #' @param hive_style logical: write partition segments as Hive-style
 #' (`key1=value1/key2=value2/file.ext`) or as just bare values. Default is `TRUE`.
+#' @param max_partitions maximum number of partitions any batch may be
+#' written into. Default is default 1024L (integer).

Review comment:
       ```suggestion
   #' written into. Default is 1024L (integer).
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org