You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/23 14:01:16 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12012: ARROW-15116: [Python] Expose invalid_row_handler for CSV reader

jorisvandenbossche commented on a change in pull request #12012:
URL: https://github.com/apache/arrow/pull/12012#discussion_r774584400



##########
File path: python/pyarrow/tests/test_csv.py
##########
@@ -109,6 +109,27 @@ def check_options_class_pickling(cls, **attr_values):
         assert getattr(new_opts, name) == value
 
 
+class InvalidRowHandler:
+    def __init__(self, result):
+        self.result = result
+        self.rows = []
+
+    def __call__(self, row):
+        self.rows.append(row)
+        if isinstance(self.result, Exception):
+            raise self.result

Review comment:
       Is this used in the tests? 
   (I thought you would use it to test what happens if the callback errors instead of returning "skip"/"error", but I don't directly see such a test)

##########
File path: python/pyarrow/tests/test_csv.py
##########
@@ -109,6 +109,27 @@ def check_options_class_pickling(cls, **attr_values):
         assert getattr(new_opts, name) == value
 
 
+class InvalidRowHandler:
+    def __init__(self, result):
+        self.result = result
+        self.rows = []
+
+    def __call__(self, row):
+        self.rows.append(row)
+        if isinstance(self.result, Exception):
+            raise self.result

Review comment:
       Maybe can also add a test where the handler returns a wrong string?

##########
File path: python/pyarrow/_csv.pyx
##########
@@ -254,15 +275,21 @@ cdef class ParseOptions(_Weakrefable):
         Whether empty lines are ignored in CSV input.
         If False, an empty line is interpreted as containing a single empty
         value (assuming a one-column CSV file).
+    invalid_row_handler : callable, optional (default None)
+        If not None, this object is called for each CSV row that fails
+        parsing (because of a mismatching number of columns).
+        It should accept a single InvalidRow argument and return either

Review comment:
       This `InvalidRow` argument is not really further described anywhere, I think? (which fields is has)
   
   (I don't think namedtuple would support adding "docstrings" for the fields)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org