You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/01 14:43:12 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #10270: ARROW-12598: [C++][Dataset] Speed up CountRows for CSV

lidavidm commented on a change in pull request #10270:
URL: https://github.com/apache/arrow/pull/10270#discussion_r643169048



##########
File path: cpp/src/arrow/dataset/file_csv.h
##########
@@ -61,6 +61,10 @@ class ARROW_DS_EXPORT CsvFileFormat : public FileFormat {
       const std::shared_ptr<ScanOptions>& scan_options,
       const std::shared_ptr<FileFragment>& file) const override;
 
+  Future<util::optional<int64_t>> CountRows(
+      const std::shared_ptr<FileFragment>& file, compute::Expression predicate,
+      std::shared_ptr<ScanOptions> options) override;

Review comment:
       I think I had patterned everything on Fragment::Scan (which takes it by value) but all the new methods take it by const reference so I'll adjust that.

##########
File path: cpp/src/arrow/dataset/file_csv_test.cc
##########
@@ -50,7 +50,10 @@ class CsvFormatHelper {
   }
 
   static std::shared_ptr<CsvFileFormat> MakeFormat() {
-    return std::make_shared<CsvFileFormat>();
+    auto format = std::make_shared<CsvFileFormat>();
+    // Required for CountRows
+    format->parse_options.ignore_empty_lines = false;

Review comment:
       It is logical rows of data; I'll clarify this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org