You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/19 18:06:03 UTC

[GitHub] [arrow] n3world commented on a change in pull request #10321: ARROW-12675: [C++] CSV parsing report row on which error occurred

n3world commented on a change in pull request #10321:
URL: https://github.com/apache/arrow/pull/10321#discussion_r635433863



##########
File path: python/pyarrow/tests/test_csv.py
##########
@@ -974,6 +976,36 @@ def read_csv(self, *args, validate_full=True, **kwargs):
         table.validate(full=validate_full)
         return table
 
+    def test_row_numbers_in_errors(self):
+        """ Row numbers are only correctly counted in serial reads """
+        csv, _ = make_random_csv(4, 100, write_names=False)

Review comment:
       I did that to work with the skip rows tests so that I knew what the row names were otherwise the row names are random values. I'll add some which do not skip rows and uses the headers from the csv.

##########
File path: cpp/src/arrow/csv/parser_test.cc
##########
@@ -623,5 +643,48 @@ TEST(BlockParser, QuotedEscape) {
   }
 }
 
+TEST(BlockParser, RowNumberAppendedToError) {
+  auto options = ParseOptions::Defaults();
+  auto csv = "a,b,c\nd,e,f\ng,h,i\n";
+  {
+    BlockParser parser(options, -1, 0);
+    ASSERT_NO_FATAL_FAILURE(AssertParseOk(parser, csv));
+    int row = 0;
+    auto status = parser.VisitColumn(
+        0, [row](const uint8_t* data, uint32_t size, bool quoted) mutable -> Status {
+          return ++row == 2 ? Status::Invalid("Bad value") : Status::OK();
+        });
+    ASSERT_RAISES(Invalid, status);
+    ASSERT_NE(std::string::npos, status.message().find("Row 1: Bad value"))
+        << status.message();
+  }
+
+  {
+    BlockParser parser(options, -1, 100);
+    ASSERT_NO_FATAL_FAILURE(AssertParseOk(parser, csv));
+    int row = 0;
+    auto status = parser.VisitColumn(
+        0, [row](const uint8_t* data, uint32_t size, bool quoted) mutable -> Status {
+          return ++row == 3 ? Status::Invalid("Bad value") : Status::OK();
+        });
+    ASSERT_RAISES(Invalid, status);
+    ASSERT_NE(std::string::npos, status.message().find("Row 102: Bad value"))
+        << status.message();
+  }
+
+  // No first row specified should not append row information
+  {
+    BlockParser parser(options, -1, -1);
+    ASSERT_NO_FATAL_FAILURE(AssertParseOk(parser, csv));
+    int row = 0;
+    auto status = parser.VisitColumn(
+        0, [row](const uint8_t* data, uint32_t size, bool quoted) mutable -> Status {
+          return ++row == 3 ? Status::Invalid("Bad value") : Status::OK();
+        });
+    ASSERT_RAISES(Invalid, status);
+    ASSERT_EQ(std::string::npos, status.message().find("Row")) << status.message();

Review comment:
       Sorry I didn't know gmock was used by arrow so I only used gtest features




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org