You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Matthew Topol (Jira)" <ji...@apache.org> on 2022/06/29 17:50:00 UTC

[jira] [Resolved] (ARROW-16926) csv reader errors clobbered by subsequent reads

     [ https://issues.apache.org/jira/browse/ARROW-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Topol resolved ARROW-16926.
-----------------------------------
    Fix Version/s: 9.0.0
       Resolution: Fixed

Issue resolved by pull request 13451
[https://github.com/apache/arrow/pull/13451]

> csv reader errors clobbered by subsequent reads
> -----------------------------------------------
>
>                 Key: ARROW-16926
>                 URL: https://issues.apache.org/jira/browse/ARROW-16926
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go
>            Reporter: Whispell Whispell
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 9.0.0
>
>   Original Estimate: 168h
>          Time Spent: 40m
>  Remaining Estimate: 167h 20m
>
> Currently you can reproduce this issue by reading a csv file with garbage string values where float64 are expected. If you place the bad data in the first part of the file, then subsequent r.r.Read() will clobber the parse err that was set inside r.read(rec)
> So at the bottom of the loop body, r.read(rec) is called, we end up in func (r *Reader) parseFloat64(field array.Builder, str string)
> it encounters an error, and sets err on the reader:
> v, err := strconv.ParseFloat(str, 64)
> if err != nil && r.err == nil {
> r.err = err
> field.AppendNull()
> return
> }
> However, when we come back out of the call to the loop, we advance in the for loop without checking the err and on the subsequent call to r.r.Read() we clobber the r.err.
> This means that if the last chunk has no error, after we read the csv, calls to r.Err() on the reader will return nil, even though an err took place during parse.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)