You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Li (Jira)" <ji...@apache.org> on 2022/10/14 14:53:00 UTC

[jira] [Created] (ARROW-18060) [C++] Writing a dataset with 0 rows doesn't create any files

David Li created ARROW-18060:
--------------------------------

             Summary: [C++] Writing a dataset with 0 rows doesn't create any files
                 Key: ARROW-18060
                 URL: https://issues.apache.org/jira/browse/ARROW-18060
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 9.0.0
            Reporter: David Li


If the input data has no rows, no files get created. This is potentially unexpected as it looks like "nothing happened". It might be nicer to create an empty file. With partitioning, though, that then gets weird (there's no partition values) so maybe an error might make more sense instead.

Reproduction in Python
{code:python}
import tempfile
from pathlib import Path

import pyarrow
import pyarrow.dataset

print("PyArrow version:", pyarrow.__version__)

table = pyarrow.table([
    [],
], schema=pyarrow.schema([
    ("ints", "int64"),
]))

with tempfile.TemporaryDirectory() as d:
    pyarrow.dataset.write_dataset(table, d, format="feather")
    print(list(Path(d).iterdir()))
{code}
Output
{noformat}
> python repro.py
PyArrow version: 9.0.0
[] {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)