You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Nolo Ogbirner (Jira)" <ji...@apache.org> on 2020/08/21 09:19:00 UTC

[jira] [Created] (ARROW-9818) Obscure C++ Error when Callign to_pandas on a RecordBatch

Nolo Ogbirner created ARROW-9818:
------------------------------------

             Summary: Obscure C++ Error when Callign to_pandas on a RecordBatch
                 Key: ARROW-9818
                 URL: https://issues.apache.org/jira/browse/ARROW-9818
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 1.0.0
         Environment: AWS Lambda with pyarrow 1.0.0
            Reporter: Nolo Ogbirner


I'm using Pyarrow to stream a CSV from an input over HTTP and then converting each RecordBatch to a Pandas DataFrame for manipulation. For testing, I'm using the NYPD Motor Vehicle Collisions Open source dataset. However, for anything above the 5MB file e.g. 1GB, 240MB, my code that is running in an AWS Lambda is failing with a RuntimeError because of

terminate called after throwing an instance of 'std::logic_error'
 what(): basic_string::_S_construct null not valid

after calling to_pandas() on the first batch. Why is this happening? How can I fix it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)