You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "German I. Ramirez-Espinoza (Jira)" <ji...@apache.org> on 2020/05/31 11:23:00 UTC
[jira] [Created] (ARROW-8987) [C++][Python] Make reading functions
to return consistent exceptions
German I. Ramirez-Espinoza created ARROW-8987:
-------------------------------------------------
Summary: [C++][Python] Make reading functions to return consistent exceptions
Key: ARROW-8987
URL: https://issues.apache.org/jira/browse/ARROW-8987
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 0.17.1
Reporter: German I. Ramirez-Espinoza
Reading functions like {{dataset.dataset}} and {{read_table}} functions in feather, parquet, and csv modules return different exceptions when reading an "empty file" or "missing file", respectively. See table below.
Most interesting is the case of {{dataset.dataset }}since the {{format}} parameter modifies the exception behaviour when reading an empty file.
||Function||Missing file||Empty File||
|feather.read_table|FileNotFoundError|ArrowInvalid|
|parquet.read_table|OSError|ArrowInvalid|
|csv.read_csv|FileNotFoundError|ArrowInvalid|
|dataset.dataset "feather"|FileNotFoundError|ArrowInvalid|
|dataset.dataset "parquet"|FileNotFoundError|OSError|
|dataset.dataset "csv"|FileNotFoundError|ArrowInvalid|
Code to reproduce issue:
{code:python}
import pathlib
import sys
import tempfile
import pyarrow as pa
import pyarrow.csv as csv
import pyarrow.dataset as dataset
import pyarrow.feather as feather
import pyarrow.parquet as parquet
tempdir = pathlib.Path(tempfile.mkdtemp())
with open(str(tempdir / "empty_feather.feather"), 'wb') as f:
pass
with open(str(tempdir / "empty_parquet.parquet"), 'wb') as f:
pass
with open(str(tempdir / "empty_csv.csv"), 'wb') as f:
pass
# Empty File
feather.read_table(str(tempdir / "empty_feather.feather"))
parquet.read_table(str(tempdir / "empty_parquet.parquet"))
csv.read_csv(str(tempdir / "empty_csv.csv"))
dataset.dataset(str(tempdir / "empty_feather.feather"), format="feather")
dataset.dataset(str(tempdir / "empty_parquet.parquet"), format="parquet")
dataset.dataset(str(tempdir / "empty_csv.csv"), format="csv")
# Missing File
feather.read_table(str(tempdir / "non_existent.feather"))
parquet.read_table(str(tempdir / "non_existent.parquet"))
csv.read_csv(str(tempdir / "non_existent.csv"))
dataset.dataset(str(tempdir / "non_existent.feather"), format="feather")
dataset.dataset(str(tempdir / "non_existent.parquet"), format="parquet")
dataset.dataset(str(tempdir / "non_existent.csv"), format="csv")
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)