You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Kai Fricke (Jira)" <ji...@apache.org> on 2022/09/07 10:07:00 UTC
[jira] [Created] (ARROW-17641) [python] Deserializing ParseOptions does not set up invalid row handler correctly
Kai Fricke created ARROW-17641:
----------------------------------
Summary: [python] Deserializing ParseOptions does not set up invalid row handler correctly
Key: ARROW-17641
URL: https://issues.apache.org/jira/browse/ARROW-17641
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 9.0.0
Reporter: Kai Fricke
Serializing and deserializing a {{csv.ParseOptions}} object with an {{invalid_row_handler}} will render the handler unusable. This is likely because the setter is not called correctly in the {{__setstate__}} method.
Reproduction script:
{code:python}
import cloudpickle
from pyarrow import csv
invalid_csv = """f1,f2
3,4
5,6
\x00\x00
7,8"""
source = "test.csv"
with open(source, "w") as f:
f.write(invalid_csv)
def read_file(path, parse_options):
# Uncomment this for a fix!
# parse_options.invalid_row_handler = parse_options.invalid_row_handler
with open(path, "rb") as f:
return csv.read_csv(f, parse_options=parse_options)
parse_options = csv.ParseOptions(delimiter=",", invalid_row_handler=lambda i: "skip")
# Will succeed
print(read_file(source, parse_options=parse_options))
parse_options = cloudpickle.loads(cloudpickle.dumps(parse_options))
# Will fail
print(read_file(source, parse_options=parse_options))
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)