You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/14 12:45:47 UTC

[GitHub] [arrow] dhruv9vats opened a new pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

dhruv9vats opened a new pull request #11946:
URL: https://github.com/apache/arrow/pull/11946


   Use a nested class like in https://github.com/apache/arrow/blob/e401246fc0d500fb43e0bad328854e6c8772509b/cpp/src/arrow/util/iterator.h#L134 to support STL-like iteration of the RecordBatchReader.
   
   TODO:
   
   - [ ] Add tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r782060241



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {

Review comment:
       Other options: 
   - `RangeIterator` as in https://github.com/apache/arrow/blob/d88e23273fd4eb7945a5fb94cfdb6315f412ea83/cpp/src/arrow/util/iterator.h#L134




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r768633153



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)

Review comment:
       Using non-owning raw-pointers as suggested by @pitrou.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-1011286586


   Thank you very much @dhruv9vats !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r781218663



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {

Review comment:
       Nit, but since this is a nested class, perhaps calling it `Iterator` is sufficient?

##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    using iterator_category = std::input_iterator_tag;
+    using difference_type = std::ptrdiff_t;
+    using value_type = std::shared_ptr<RecordBatch>;
+    using pointer = value_type const*;
+    using reference = value_type const&;
+
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()), reader_(NULLPTR) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator==(const RecordBatchReaderIterator& other) const {
+      return batch_ == other.batch_;
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return !(*this == other);
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      return batch_;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+    RecordBatchReaderIterator operator++(int) {
+      RecordBatchReaderIterator tmp(*this);
+      Next();
+      return tmp;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {

Review comment:
       I don't think anything more is required, no. The caller should notice that the iterator is equal to `end()`, and then stop reading.

##########
File path: cpp/src/arrow/record_batch_test.cc
##########
@@ -350,4 +350,64 @@ TEST_F(TestRecordBatch, MakeEmpty) {
   ASSERT_EQ(empty->num_rows(), 0);
 }
 
+class TestRecordBatchReader : public ::testing::Test {
+ public:
+  void SetUp() override { MakeBatchesAndReader(100); }
+
+ protected:
+  void MakeBatchesAndReader(int length) {
+    auto field1 = field("f1", int32());
+    auto field2 = field("f2", uint8());
+    auto field3 = field("f3", int16());
+
+    auto schema = ::arrow::schema({field1, field2, field3});
+
+    random::RandomArrayGenerator gen(42);
+
+    auto array1_1 = gen.ArrayOf(int32(), length);
+    auto array1_2 = gen.ArrayOf(int32(), length);
+    auto array1_3 = gen.ArrayOf(int32(), length);
+
+    auto array2_1 = gen.ArrayOf(uint8(), length);
+    auto array2_2 = gen.ArrayOf(uint8(), length);
+    auto array2_3 = gen.ArrayOf(uint8(), length);
+
+    auto array3_1 = gen.ArrayOf(int16(), length);
+    auto array3_2 = gen.ArrayOf(int16(), length);
+    auto array3_3 = gen.ArrayOf(int16(), length);
+
+    auto batch1 = RecordBatch::Make(schema, length, {array1_1, array2_1, array3_1});
+    auto batch2 = RecordBatch::Make(schema, length, {array1_2, array2_2, array3_2});
+    auto batch3 = RecordBatch::Make(schema, length, {array1_3, array2_3, array3_3});
+
+    batches_ = {batch1, batch2, batch3};
+
+    ASSERT_OK_AND_ASSIGN(reader_, RecordBatchReader::Make(batches_));
+  }
+  std::vector<std::shared_ptr<RecordBatch>> batches_;
+  std::shared_ptr<RecordBatchReader> reader_;
+};
+
+TEST_F(TestRecordBatchReader, RangeForLoop) {
+  int64_t i = 0;
+  ASSERT_LT(i, static_cast<int64_t>(batches_.size()));

Review comment:
       This check was meant to go inside the loop, before `i` is incremented.

##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    using iterator_category = std::input_iterator_tag;
+    using difference_type = std::ptrdiff_t;
+    using value_type = std::shared_ptr<RecordBatch>;
+    using pointer = value_type const*;
+    using reference = value_type const&;
+
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()), reader_(NULLPTR) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator==(const RecordBatchReaderIterator& other) const {
+      return batch_ == other.batch_;
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return !(*this == other);
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      return batch_;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+    RecordBatchReaderIterator operator++(int) {
+      RecordBatchReaderIterator tmp(*this);
+      Next();
+      return tmp;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {
+      if (reader_ == NULLPTR) {
+        batch_ = RecordBatchEnd();
+        return;
+      }
+      batch_ = reader_->Next();
+    }
+
+    Result<std::shared_ptr<RecordBatch>> batch_;
+    RecordBatchReader* reader_;
+  };
+
+  RecordBatchReaderIterator begin() { return RecordBatchReaderIterator(this); }
+
+  RecordBatchReaderIterator end() { return RecordBatchReaderIterator(); }

Review comment:
       Can you add a terse docstring to these two public methods?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-993504264


   https://issues.apache.org/jira/browse/ARROW-13663


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #11946:
URL: https://github.com/apache/arrow/pull/11946


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-1011287163


   Benchmark runs are scheduled for baseline = 7303b51ad2f9ac3f0c59bee7221771552ed4eb46 and contender = 7b955fc6c980e033dc112466f502b325cf5575ce. 7b955fc6c980e033dc112466f502b325cf5575ce is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1c173a5cb70b410a8fb9c54181295aad...19db914461e04f27939ace96d7df7d9d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f9a5a884a48447148be5a8ba9ba0f689...ed090a9faefb4b33a943285916c349fb/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4f575a69492747e3a6a1ee430b2e7b9f...1bbf1e64d0ab4970a725bf145f37a3cf/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-1011287163


   Benchmark runs are scheduled for baseline = 7303b51ad2f9ac3f0c59bee7221771552ed4eb46 and contender = 7b955fc6c980e033dc112466f502b325cf5575ce. 7b955fc6c980e033dc112466f502b325cf5575ce is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1c173a5cb70b410a8fb9c54181295aad...19db914461e04f27939ace96d7df7d9d/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f9a5a884a48447148be5a8ba9ba0f689...ed090a9faefb4b33a943285916c349fb/)
   [Finished :arrow_down:0.66% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4f575a69492747e3a6a1ee430b2e7b9f...1bbf1e64d0ab4970a725bf145f37a3cf/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r778168422



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return batch_ != other.batch_;
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      auto batch = std::move(batch_);
+      batch_ = RecordBatchEnd();

Review comment:
       It is rather weird for `operator*` to change the stored batch. That means if I use this operator twice, I will get two different results; that sounds rather undesired.

##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return batch_ != other.batch_;
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      auto batch = std::move(batch_);
+      batch_ = RecordBatchEnd();
+      return batch;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {
+      if (!batch_.ok()) {
+        batch_ = RecordBatchEnd();

Review comment:
       This means that if an error occurs, the iterator is ended afterwards. I'm not sure why we should do that?

##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:

Review comment:
       It would be nicer to satisfy the [input iterator requirement](https://en.cppreference.com/w/cpp/named_req/InputIterator), i.e. add type definitions (`value_type`, `reference`) as well as equality and post-increment operators.

##########
File path: cpp/src/arrow/record_batch_test.cc
##########
@@ -332,4 +332,72 @@ TEST_F(TestRecordBatch, MakeEmpty) {
   ASSERT_EQ(empty->num_rows(), 0);
 }
 
+class TestRecordBatchReader : public TestBase {};
+
+TEST_F(TestRecordBatchReader, RangeForLoop) {
+  const int length = 10;
+
+  auto f0 = field("f0", int32());
+  auto f1 = field("f1", uint8());
+  auto f2 = field("f2", int16());
+
+  auto metadata = key_value_metadata({"foo"}, {"bar"});
+
+  std::vector<std::shared_ptr<Field>> fields = {f0, f1, f2};
+  auto schema = ::arrow::schema({f0, f1, f2});
+  auto schema2 = ::arrow::schema({f0, f1});
+  auto schema3 = ::arrow::schema({f0, f1, f2}, metadata);
+
+  auto a0 = MakeRandomArray<Int32Array>(length);
+  auto a1 = MakeRandomArray<UInt8Array>(length);
+  auto a2 = MakeRandomArray<Int16Array>(length);
+
+  auto b1 = RecordBatch::Make(schema, length, {a0, a1, a2});
+  auto b2 = RecordBatch::Make(schema3, length, {a0, a1, a2});
+  auto b3 = RecordBatch::Make(schema2, length, {a0, a1});
+  auto b4 = RecordBatch::Make(schema, length, {a0, a1, a1});
+
+  std::vector<std::shared_ptr<RecordBatch>> batches = {b1, b2, b3, b4};
+
+  ASSERT_OK_AND_ASSIGN(auto reader, RecordBatchReader::Make(batches));
+
+  int64_t i = 0;
+  for (auto maybe_batch : *reader) {
+    ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch);
+    AssertBatchesEqual(*batch, *batches[i++]);
+  }
+}
+
+TEST_F(TestRecordBatchReader, BeginEndForLoop) {
+  const int length = 10;
+
+  auto field1 = field("f1", int32());
+  auto field2 = field("f2", uint8());
+  auto field3 = field("f3", int16());

Review comment:
       Can you factor out (in the fixture class above) and reuse the test data for these two tests?

##########
File path: cpp/src/arrow/record_batch_test.cc
##########
@@ -332,4 +332,72 @@ TEST_F(TestRecordBatch, MakeEmpty) {
   ASSERT_EQ(empty->num_rows(), 0);
 }
 
+class TestRecordBatchReader : public TestBase {};
+
+TEST_F(TestRecordBatchReader, RangeForLoop) {
+  const int length = 10;
+
+  auto f0 = field("f0", int32());
+  auto f1 = field("f1", uint8());
+  auto f2 = field("f2", int16());
+
+  auto metadata = key_value_metadata({"foo"}, {"bar"});
+
+  std::vector<std::shared_ptr<Field>> fields = {f0, f1, f2};
+  auto schema = ::arrow::schema({f0, f1, f2});
+  auto schema2 = ::arrow::schema({f0, f1});
+  auto schema3 = ::arrow::schema({f0, f1, f2}, metadata);
+
+  auto a0 = MakeRandomArray<Int32Array>(length);
+  auto a1 = MakeRandomArray<UInt8Array>(length);
+  auto a2 = MakeRandomArray<Int16Array>(length);
+
+  auto b1 = RecordBatch::Make(schema, length, {a0, a1, a2});
+  auto b2 = RecordBatch::Make(schema3, length, {a0, a1, a2});
+  auto b3 = RecordBatch::Make(schema2, length, {a0, a1});
+  auto b4 = RecordBatch::Make(schema, length, {a0, a1, a1});
+
+  std::vector<std::shared_ptr<RecordBatch>> batches = {b1, b2, b3, b4};
+
+  ASSERT_OK_AND_ASSIGN(auto reader, RecordBatchReader::Make(batches));
+
+  int64_t i = 0;
+  for (auto maybe_batch : *reader) {
+    ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch);
+    AssertBatchesEqual(*batch, *batches[i++]);
+  }
+}

Review comment:
       After the loop, check that all batches were seen: `ASSERT_EQ(i, static_cast<int64_t>(batches.size()))`

##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return batch_ != other.batch_;
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      auto batch = std::move(batch_);
+      batch_ = RecordBatchEnd();
+      return batch;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {
+      if (!batch_.ok()) {
+        batch_ = RecordBatchEnd();
+        return;
+      }
+      batch_ = reader_->Next();

Review comment:
       What if `reader_` is null, for example if this is called on `RecordBatchReader::end()`?

##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}

Review comment:
       Do you leave `reader_` undefined here?

##########
File path: cpp/src/arrow/record_batch_test.cc
##########
@@ -332,4 +332,72 @@ TEST_F(TestRecordBatch, MakeEmpty) {
   ASSERT_EQ(empty->num_rows(), 0);
 }
 
+class TestRecordBatchReader : public TestBase {};
+
+TEST_F(TestRecordBatchReader, RangeForLoop) {
+  const int length = 10;
+
+  auto f0 = field("f0", int32());
+  auto f1 = field("f1", uint8());
+  auto f2 = field("f2", int16());
+
+  auto metadata = key_value_metadata({"foo"}, {"bar"});
+
+  std::vector<std::shared_ptr<Field>> fields = {f0, f1, f2};
+  auto schema = ::arrow::schema({f0, f1, f2});
+  auto schema2 = ::arrow::schema({f0, f1});
+  auto schema3 = ::arrow::schema({f0, f1, f2}, metadata);
+
+  auto a0 = MakeRandomArray<Int32Array>(length);
+  auto a1 = MakeRandomArray<UInt8Array>(length);
+  auto a2 = MakeRandomArray<Int16Array>(length);
+
+  auto b1 = RecordBatch::Make(schema, length, {a0, a1, a2});
+  auto b2 = RecordBatch::Make(schema3, length, {a0, a1, a2});
+  auto b3 = RecordBatch::Make(schema2, length, {a0, a1});
+  auto b4 = RecordBatch::Make(schema, length, {a0, a1, a1});
+
+  std::vector<std::shared_ptr<RecordBatch>> batches = {b1, b2, b3, b4};
+
+  ASSERT_OK_AND_ASSIGN(auto reader, RecordBatchReader::Make(batches));
+
+  int64_t i = 0;
+  for (auto maybe_batch : *reader) {
+    ASSERT_OK_AND_ASSIGN(auto batch, maybe_batch);
+    AssertBatchesEqual(*batch, *batches[i++]);

Review comment:
       To avoid crashes or undefined behabiour, first `ASSERT_LT(i, static_cast<int64_t>(batches.size()))`.

##########
File path: cpp/src/arrow/record_batch_test.cc
##########
@@ -332,4 +332,72 @@ TEST_F(TestRecordBatch, MakeEmpty) {
   ASSERT_EQ(empty->num_rows(), 0);
 }
 
+class TestRecordBatchReader : public TestBase {};
+
+TEST_F(TestRecordBatchReader, RangeForLoop) {
+  const int length = 10;
+
+  auto f0 = field("f0", int32());
+  auto f1 = field("f1", uint8());
+  auto f2 = field("f2", int16());
+
+  auto metadata = key_value_metadata({"foo"}, {"bar"});
+
+  std::vector<std::shared_ptr<Field>> fields = {f0, f1, f2};
+  auto schema = ::arrow::schema({f0, f1, f2});
+  auto schema2 = ::arrow::schema({f0, f1});
+  auto schema3 = ::arrow::schema({f0, f1, f2}, metadata);

Review comment:
       Hmm, it does not really make sense to iterate over record batches with different schemas. You should simplify this by using the same schema.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r768633153



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)

Review comment:
       Using non-owning raw-pointers




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r782143512



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,68 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    using iterator_category = std::input_iterator_tag;
+    using difference_type = std::ptrdiff_t;
+    using value_type = std::shared_ptr<RecordBatch>;
+    using pointer = value_type const*;
+    using reference = value_type const&;
+
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()), reader_(NULLPTR) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator==(const RecordBatchReaderIterator& other) const {
+      return batch_ == other.batch_;
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return !(*this == other);
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      return batch_;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+    RecordBatchReaderIterator operator++(int) {
+      RecordBatchReaderIterator tmp(*this);
+      Next();
+      return tmp;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {
+      if (reader_ == NULLPTR) {
+        batch_ = RecordBatchEnd();
+        return;
+      }
+      batch_ = reader_->Next();
+    }
+
+    Result<std::shared_ptr<RecordBatch>> batch_;
+    RecordBatchReader* reader_;
+  };
+  /// \brief Returns an iterator to the first record batch in the stream
+  RecordBatchReaderIterator begin() { return RecordBatchReaderIterator(this); }

Review comment:
       It's ok, however please use "Return" (infinitive) not "Returns" (present tense).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r782293251



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,68 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    using iterator_category = std::input_iterator_tag;
+    using difference_type = std::ptrdiff_t;
+    using value_type = std::shared_ptr<RecordBatch>;
+    using pointer = value_type const*;
+    using reference = value_type const&;
+
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()), reader_(NULLPTR) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator==(const RecordBatchReaderIterator& other) const {
+      return batch_ == other.batch_;
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return !(*this == other);
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      return batch_;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+    RecordBatchReaderIterator operator++(int) {
+      RecordBatchReaderIterator tmp(*this);
+      Next();
+      return tmp;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {
+      if (reader_ == NULLPTR) {
+        batch_ = RecordBatchEnd();
+        return;
+      }
+      batch_ = reader_->Next();
+    }
+
+    Result<std::shared_ptr<RecordBatch>> batch_;
+    RecordBatchReader* reader_;
+  };
+  /// \brief Returns an iterator to the first record batch in the stream
+  RecordBatchReaderIterator begin() { return RecordBatchReaderIterator(this); }

Review comment:
       Sure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r782081749



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,68 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    using iterator_category = std::input_iterator_tag;
+    using difference_type = std::ptrdiff_t;
+    using value_type = std::shared_ptr<RecordBatch>;
+    using pointer = value_type const*;
+    using reference = value_type const&;
+
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()), reader_(NULLPTR) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator==(const RecordBatchReaderIterator& other) const {
+      return batch_ == other.batch_;
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return !(*this == other);
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      return batch_;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+    RecordBatchReaderIterator operator++(int) {
+      RecordBatchReaderIterator tmp(*this);
+      Next();
+      return tmp;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {
+      if (reader_ == NULLPTR) {
+        batch_ = RecordBatchEnd();
+        return;
+      }
+      batch_ = reader_->Next();
+    }
+
+    Result<std::shared_ptr<RecordBatch>> batch_;
+    RecordBatchReader* reader_;
+  };
+  /// \brief Returns an iterator to the first record batch in the stream
+  RecordBatchReaderIterator begin() { return RecordBatchReaderIterator(this); }

Review comment:
       Is the use of "in the stream" okay here, or should it be removed or replaced?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-1004303644


   Added some basic tests. 
   If more sophisticated ones are also needed, could you give a brief outline of those? @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r778203973



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:

Review comment:
       Pardon the lack of rigor, this was a direct translation of the nested class at https://github.com/apache/arrow/blob/31a07be1d9dc2f7c9720cc0fdcd7f083d947aba1/cpp/src/arrow/util/iterator.h#L134 to RecordBatchReader. I'll be sure to add these definitions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r779597891



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    using iterator_category = std::input_iterator_tag;
+    using difference_type = std::ptrdiff_t;
+    using value_type = std::shared_ptr<RecordBatch>;
+    using pointer = value_type const*;
+    using reference = value_type const&;
+
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()), reader_(NULLPTR) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator==(const RecordBatchReaderIterator& other) const {
+      return batch_ == other.batch_;
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return !(*this == other);
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      return batch_;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+    RecordBatchReaderIterator operator++(int) {
+      RecordBatchReaderIterator tmp(*this);
+      Next();
+      return tmp;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);
+    }
+
+    void Next() {

Review comment:
       I've tried to add all the basic things you mentioned. But is this the correct way to implement the `Next` function here, because it feels like something is missing here?
   Is there supposed to be an additional check here to see if the end has been reached? (Sorry if I'm overlooking something obvious and this is a naive question) @pitrou 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r782142890



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {

Review comment:
       Ah, fair enough. We can keep it like this, then.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-1011287163


   Benchmark runs are scheduled for baseline = 7303b51ad2f9ac3f0c59bee7221771552ed4eb46 and contender = 7b955fc6c980e033dc112466f502b325cf5575ce. 7b955fc6c980e033dc112466f502b325cf5575ce is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1c173a5cb70b410a8fb9c54181295aad...19db914461e04f27939ace96d7df7d9d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f9a5a884a48447148be5a8ba9ba0f689...ed090a9faefb4b33a943285916c349fb/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4f575a69492747e3a6a1ee430b2e7b9f...1bbf1e64d0ab4970a725bf145f37a3cf/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r768639599



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,53 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {
+   public:
+    RecordBatchReaderIterator() : batch_(RecordBatchEnd()) {}
+
+    explicit RecordBatchReaderIterator(RecordBatchReader* reader)
+        : batch_(RecordBatchEnd()), reader_(reader) {
+      Next();
+    }
+
+    bool operator!=(const RecordBatchReaderIterator& other) const {
+      return batch_ != other.batch_;
+    }
+
+    Result<std::shared_ptr<RecordBatch>> operator*() {
+      ARROW_RETURN_NOT_OK(batch_.status());
+
+      auto batch = std::move(batch_);
+      batch_ = RecordBatchEnd();
+      return batch;
+    }
+
+    RecordBatchReaderIterator& operator++() {
+      Next();
+      return *this;
+    }
+
+   private:
+    std::shared_ptr<RecordBatch> RecordBatchEnd() {
+      return std::shared_ptr<RecordBatch>(NULLPTR);

Review comment:
       This is what `IterationTraits<std::shared_ptr<RecordBatch>>::End()` would have returned, but using it showed the error:
   ```
   incomplete type 'arrow::IterationTraits<std::shared_ptr<arrow::RecordBatch> >' used in nested name specifier
   ```
   which seems due to the abstract nature of the class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
dhruv9vats commented on a change in pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#discussion_r782012491



##########
File path: cpp/src/arrow/record_batch.h
##########
@@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader {
     return batch;
   }
 
+  class RecordBatchReaderIterator {

Review comment:
       In https://github.com/apache/arrow/blob/d88e23273fd4eb7945a5fb94cfdb6315f412ea83/cpp/src/arrow/record_batch.cc#L369 the name `Iterator` is in conflict.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11946:
URL: https://github.com/apache/arrow/pull/11946#issuecomment-1011287163


   Benchmark runs are scheduled for baseline = 7303b51ad2f9ac3f0c59bee7221771552ed4eb46 and contender = 7b955fc6c980e033dc112466f502b325cf5575ce. 7b955fc6c980e033dc112466f502b325cf5575ce is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/1c173a5cb70b410a8fb9c54181295aad...19db914461e04f27939ace96d7df7d9d/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f9a5a884a48447148be5a8ba9ba0f689...ed090a9faefb4b33a943285916c349fb/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4f575a69492747e3a6a1ee430b2e7b9f...1bbf1e64d0ab4970a725bf145f37a3cf/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org