You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "German I. Ramirez-Espinoza (Jira)" <ji...@apache.org> on 2020/05/31 11:23:00 UTC

[jira] [Created] (ARROW-8987) [C++][Python] Make reading functions to return consistent exceptions

German I. Ramirez-Espinoza created ARROW-8987:
-------------------------------------------------

             Summary: [C++][Python] Make reading functions to return consistent exceptions
                 Key: ARROW-8987
                 URL: https://issues.apache.org/jira/browse/ARROW-8987
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Python
    Affects Versions: 0.17.1
            Reporter: German I. Ramirez-Espinoza


Reading functions like {{dataset.dataset}} and {{read_table}} functions in feather, parquet, and csv modules return different exceptions when reading an "empty file" or "missing file", respectively. See table below.

Most interesting is the case of {{dataset.dataset }}since the {{format}} parameter modifies the exception behaviour when reading an empty file.

 
||Function||Missing file||Empty File||
|feather.read_table|FileNotFoundError|ArrowInvalid|
|parquet.read_table|OSError|ArrowInvalid|
|csv.read_csv|FileNotFoundError|ArrowInvalid|
|dataset.dataset "feather"|FileNotFoundError|ArrowInvalid|
|dataset.dataset "parquet"|FileNotFoundError|OSError|
|dataset.dataset "csv"|FileNotFoundError|ArrowInvalid|

 

Code to reproduce issue:
{code:python}
import pathlib
import sys
import tempfile

import pyarrow as pa

import pyarrow.csv as csv
import pyarrow.dataset as dataset
import pyarrow.feather as feather
import pyarrow.parquet as parquet

tempdir = pathlib.Path(tempfile.mkdtemp())

with open(str(tempdir / "empty_feather.feather"), 'wb') as f:
    pass

with open(str(tempdir / "empty_parquet.parquet"), 'wb') as f:
    pass

with open(str(tempdir / "empty_csv.csv"), 'wb') as f:
    pass

# Empty File
feather.read_table(str(tempdir / "empty_feather.feather"))
parquet.read_table(str(tempdir / "empty_parquet.parquet"))
csv.read_csv(str(tempdir / "empty_csv.csv"))
dataset.dataset(str(tempdir / "empty_feather.feather"), format="feather")
dataset.dataset(str(tempdir / "empty_parquet.parquet"), format="parquet")
dataset.dataset(str(tempdir / "empty_csv.csv"), format="csv")

# Missing File
feather.read_table(str(tempdir / "non_existent.feather"))
parquet.read_table(str(tempdir / "non_existent.parquet"))
csv.read_csv(str(tempdir / "non_existent.csv"))
dataset.dataset(str(tempdir / "non_existent.feather"), format="feather")
dataset.dataset(str(tempdir / "non_existent.parquet"), format="parquet")
dataset.dataset(str(tempdir / "non_existent.csv"), format="csv")

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)