You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Nolo Ogbirner (Jira)" <ji...@apache.org> on 2020/08/21 09:19:00 UTC
[jira] [Created] (ARROW-9818) Obscure C++ Error when Callign
to_pandas on a RecordBatch
Nolo Ogbirner created ARROW-9818:
------------------------------------
Summary: Obscure C++ Error when Callign to_pandas on a RecordBatch
Key: ARROW-9818
URL: https://issues.apache.org/jira/browse/ARROW-9818
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 1.0.0
Environment: AWS Lambda with pyarrow 1.0.0
Reporter: Nolo Ogbirner
I'm using Pyarrow to stream a CSV from an input over HTTP and then converting each RecordBatch to a Pandas DataFrame for manipulation. For testing, I'm using the NYPD Motor Vehicle Collisions Open source dataset. However, for anything above the 5MB file e.g. 1GB, 240MB, my code that is running in an AWS Lambda is failing with a RuntimeError because of
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
after calling to_pandas() on the first batch. Why is this happening? How can I fix it?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)