You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/03/08 18:05:00 UTC

[jira] [Commented] (ARROW-9818) [Python] Obscure C++ Error when Calling to_pandas on a RecordBatch

    [ https://issues.apache.org/jira/browse/ARROW-9818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503095#comment-17503095 ] 

Joris Van den Bossche commented on ARROW-9818:
----------------------------------------------

Since there is no clear reproducible issue, I am going to close this. [~Lachrymatory] if you would run into it again, feel free to reopen

> [Python] Obscure C++ Error when Calling to_pandas on a RecordBatch
> ------------------------------------------------------------------
>
>                 Key: ARROW-9818
>                 URL: https://issues.apache.org/jira/browse/ARROW-9818
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 1.0.0
>         Environment: AWS Lambda with pyarrow 1.0.0
>            Reporter: Nolo Ogbirner
>            Priority: Critical
>
> I'm using Pyarrow to stream a CSV from an input over HTTP and then converting each RecordBatch to a Pandas DataFrame for manipulation. For testing, I'm using the NYPD Motor Vehicle Collisions Open source dataset. However, for anything above the 5MB file e.g. 1GB, 240MB, my code that is running in an AWS Lambda is failing with a RuntimeError because of
> terminate called after throwing an instance of 'std::logic_error'
>  what(): basic_string::_S_construct null not valid
> after calling to_pandas() on the first batch. Why is this happening? How can I fix it? This happened when some 7 of the 28 columns were inferred to be of type null, so I instead set strings_can_be_null=True on my ReadOptions for CSV reading and provided a schema that forced the null columns to be strings. This didn't work. I suspect it has something to do with the size of the file, but am unsure.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)