You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/10/14 15:05:00 UTC

[jira] [Commented] (ARROW-18060) [C++] Writing a dataset with 0 rows doesn't create any files

    [ https://issues.apache.org/jira/browse/ARROW-18060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617786#comment-17617786 ] 

Neal Richardson commented on ARROW-18060:
-----------------------------------------

See also ARROW-16575. 

> [C++] Writing a dataset with 0 rows doesn't create any files
> ------------------------------------------------------------
>
>                 Key: ARROW-18060
>                 URL: https://issues.apache.org/jira/browse/ARROW-18060
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 9.0.0
>            Reporter: David Li
>            Priority: Major
>
> If the input data has no rows, no files get created. This is potentially unexpected as it looks like "nothing happened". It might be nicer to create an empty file. With partitioning, though, that then gets weird (there's no partition values) so maybe an error might make more sense instead.
> Reproduction in Python
> {code:python}
> import tempfile
> from pathlib import Path
> import pyarrow
> import pyarrow.dataset
> print("PyArrow version:", pyarrow.__version__)
> table = pyarrow.table([
>     [],
> ], schema=pyarrow.schema([
>     ("ints", "int64"),
> ]))
> with tempfile.TemporaryDirectory() as d:
>     pyarrow.dataset.write_dataset(table, d, format="feather")
>     print(list(Path(d).iterdir()))
> {code}
> Output
> {noformat}
> > python repro.py
> PyArrow version: 9.0.0
> [] {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)