You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Sascha Hofmann (Jira)" <ji...@apache.org> on 2019/10/18 12:26:00 UTC

[jira] [Created] (ARROW-6934) [Python] Choose string column encoding in csv reader

Sascha Hofmann created ARROW-6934:
-------------------------------------

             Summary: [Python] Choose string column encoding in csv reader
                 Key: ARROW-6934
                 URL: https://issues.apache.org/jira/browse/ARROW-6934
             Project: Apache Arrow
          Issue Type: Wish
            Reporter: Sascha Hofmann


I was wondering whether there is a possibility to provide a different encoding for string columns in the parse option of the csv reader in pyarrow. I saw that there is a check whether or not a column is utf8 encoded. The default seems to be that if that turns out to be false the column is interpreted as binary.

Is there any way to have a fallback option, meaning if the check_utf8 is false then maybe try latin-1 before turning to binary?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)