You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Whispell Whispell (Jira)" <ji...@apache.org> on 2022/06/28 15:26:00 UTC

[jira] [Created] (ARROW-16926) csv reader errors clobbered by subsequent reads

|  ![](cid:jira-generated-image-avatar-ccc950ca-63e7-4003-b6cd-eba768b4f328) |
[Whispell
Whispell](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wwhispell)
**created** an issue  
---|---  
|  
---  
|  [Apache Arrow](https://issues.apache.org/jira/browse/ARROW) /
[![Bug](cid:jira-generated-image-
avatar-d7c115f9-d45b-4164-9940-c51ee3a588aa)](https://issues.apache.org/jira/browse/ARROW-16926)
[ARROW-16926](https://issues.apache.org/jira/browse/ARROW-16926)  
---  
[csv reader errors clobbered by subsequent
reads](https://issues.apache.org/jira/browse/ARROW-16926)  
| Issue Type: |  ![Bug](cid:jira-generated-image-
avatar-d7c115f9-d45b-4164-9940-c51ee3a588aa) Bug  
---|---  
Assignee: |  Unassigned  
Components: |  Go  
Created: |  28/Jun/22 15:25  
Priority: |  ![Minor](cid:jira-generated-image-static-
minor-f41a91e8-d295-48f2-a74e-5a3cc8002d4a) Minor  
Reporter: |  [Whispell
Whispell](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wwhispell)  
Original Estimate: | 168h  
Remaining Estimate:  | 168h  
|

Currently you can reproduce this issue by reading a csv file with garbage
string values where float64 are expected. If you place the bad data in the
first part of the file, then subsequent r.r.Read() will clobber the parse err
that was set inside r.read(rec)

So at the bottom of the loop body, r.read(rec) is called, we end up in func (r
*Reader) parseFloat64(field array.Builder, str string)  
it encounters an error, and sets err on the reader:  
v, err := strconv.ParseFloat(str, 64)  
if err != nil && r.err == nil

{ r.err = err field.AppendNull() return }

However, when we come back out of the call to the loop, we advance in the for
loop without checking the err and on the subsequent call to r.r.Read() we
clobber the r.err.

This means that if the last chunk has no error, after we read the csv, calls
to r.Err() on the reader will return nil, even though an err took place during
parse.  
  
---  
|  |  [ ![Add Comment](cid:jira-generated-image-static-comment-
icon-01f0ec27-f710-44e6-a683-ee4909cfabd7)
](https://issues.apache.org/jira/browse/ARROW-16926#add-comment "Add Comment")
|  [Add Comment](https://issues.apache.org/jira/browse/ARROW-16926#add-comment
"Add Comment")  
---|---  
  
|  This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) |  |
![Atlassian logo](https://issues.apache.org/jira/images/mail/atlassian-email-
logo.png)  
---