You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/07/06 13:41:00 UTC

[jira] [Updated] (ARROW-13252) [C++] CSV Add byte offset for error messages

     [ https://issues.apache.org/jira/browse/ARROW-13252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-13252:
-----------------------------------
    Labels: pull-request-available  (was: )

> [C++] CSV Add byte offset for error messages
> --------------------------------------------
>
>                 Key: ARROW-13252
>                 URL: https://issues.apache.org/jira/browse/ARROW-13252
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Nate Clark
>            Assignee: Nate Clark
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> CSV parsing error messages will contain the row number when parallel parsing is not enabled but when parallel parsing is enabled there is no indication of where the error occurred in the input. In order to add that context the row byte offset can be added to the output.
>  
> This can be done relatively easily for the parser but associating byte offsets with the data or row being decoded would require more metadata to be maintained in the DataBatch. Potentially doubling the size of ParsedValueDesc.
>  
> This was mentioned and discussed in comments [here|https://github.com/apache/arrow/pull/10202#issuecomment-870796708]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)