You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kai Fricke (Jira)" <ji...@apache.org> on 2022/09/07 11:17:00 UTC

[jira] [Assigned] (ARROW-17641) [python] Deserializing ParseOptions does not set up invalid row handler correctly

     [ https://issues.apache.org/jira/browse/ARROW-17641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kai Fricke reassigned ARROW-17641:
----------------------------------

    Assignee: Kai Fricke

> [python] Deserializing ParseOptions does not set up invalid row handler correctly
> ---------------------------------------------------------------------------------
>
>                 Key: ARROW-17641
>                 URL: https://issues.apache.org/jira/browse/ARROW-17641
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 9.0.0
>            Reporter: Kai Fricke
>            Assignee: Kai Fricke
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Serializing and deserializing a {{csv.ParseOptions}} object with an {{invalid_row_handler}} will render the handler unusable. This is likely because the setter is not called correctly in the {{__setstate__}} method.
> Reproduction script:
>  
> {code:python}
> import cloudpickle
> from pyarrow import csv
> invalid_csv = """f1,f2
> 3,4
> 5,6
> \x00\x00
> 7,8"""
> source = "test.csv"
> with open(source, "w") as f:
>     f.write(invalid_csv)
> def read_file(path, parse_options):
>     # Uncomment this for a fix!
>     # parse_options.invalid_row_handler = parse_options.invalid_row_handler
>     with open(path, "rb") as f:
>         return csv.read_csv(f, parse_options=parse_options)
> parse_options = csv.ParseOptions(delimiter=",", invalid_row_handler=lambda i: "skip")
> # Will succeed
> print(read_file(source, parse_options=parse_options))
> parse_options = cloudpickle.loads(cloudpickle.dumps(parse_options))
> # Will fail
> print(read_file(source, parse_options=parse_options))
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)