You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/10 15:29:15 UTC

[GitHub] [arrow] n3world edited a comment on pull request #10202: ARROW-12673: [C++] Add parser handler for incorrect column counts

n3world edited a comment on pull request #10202:
URL: https://github.com/apache/arrow/pull/10202#issuecomment-836840781


   > > At a minimum I would like to be able to skip the rows and collect information about the skipped rows so it could be presented to a user saying when and where malformed rows were found
   > 
   > I agree that being able to point the row number where an error occurred is useful, but we shouldn't need a callback for that.
   
   For that alone no. But when you start to think about the combinations of ways these rows could be handled it starts to get very complex for both short rows and long rows you could either error, skip or fix and if you don't error do you need to report that row or is it silent. To describe that combination of possible handlers you would need 5 options for both short and long rows and then you would need to express any combination of those 5 options. The distinction between the silent skip and report skip is because currently the best way to report a row is by including the entire text of the row and if there are a good number of rows that need to be reported that could result in noticeable overhead if the caller just wants the handling to be done silently. Because of this I was thinking it would be easier to expose a callback with some pre defined simple implementations. That way more complex options could be implemented by the user.
   
   If we wanted to not have the callback and support that matrix of options the best way might be two enums one for short rows and one for long rows and then a mechanism to track rows which are to be reported. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org