You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/06/07 13:30:00 UTC
[jira] [Created] (ARROW-12988) [CI] The kartothek nightly
integration build is failing (test_update_dataset_from_ddf_empty)
Joris Van den Bossche created ARROW-12988:
---------------------------------------------
Summary: [CI] The kartothek nightly integration build is failing (test_update_dataset_from_ddf_empty)
Key: ARROW-12988
URL: https://issues.apache.org/jira/browse/ARROW-12988
Project: Apache Arrow
Issue Type: Bug
Components: Continuous Integration, Python
Reporter: Joris Van den Bossche
The nightly "kartothek" integration builds are failing.
More specifically, the {{test_update_dataset_from_ddf_empty}} is failing with:
{code}
=================================== FAILURES ===================================
___________________ test_update_dataset_from_ddf_empty[True] ___________________
store_factory = functools.partial(<function get_store_from_url at 0x7f1434733050>, 'hfs:///tmp/pytest-of-root/pytest-0/test_update_dataset_from_ddf_e0/store')
shuffle = True
@pytest.mark.parametrize("shuffle", [True, False])
def test_update_dataset_from_ddf_empty(store_factory, shuffle):
with pytest.raises(ValueError, match="Cannot store empty datasets"):
update_dataset_from_ddf(
> dask.dataframe.from_delayed([], meta=(("a", int),)),
store_factory,
dataset_uuid="output_dataset_uuid",
table="core",
shuffle=shuffle,
partition_on=["a"],
).compute()
tests/io/dask/dataframe/test_update.py:57:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dfs = [], meta = (('a', <class 'int'>),), divisions = None
prefix = 'from-delayed', verify_meta = True
@insert_meta_param_description
def from_delayed(
dfs, meta=None, divisions=None, prefix="from-delayed", verify_meta=True
):
"""Create Dask DataFrame from many Dask Delayed objects
Parameters
----------
dfs : list of Delayed
An iterable of ``dask.delayed.Delayed`` objects, such as come from
``dask.delayed`` These comprise the individual partitions of the
resulting dataframe.
$META
divisions : tuple, str, optional
Partition boundaries along the index.
For tuple, see https://docs.dask.org/en/latest/dataframe-design.html#partitions
For string 'sorted' will compute the delayed values to find index
values. Assumes that the indexes are mutually sorted.
If None, then won't use index information
prefix : str, optional
Prefix to prepend to the keys.
verify_meta : bool, optional
If True check that the partitions have consistent metadata, defaults to True.
"""
from dask.delayed import Delayed
if isinstance(dfs, Delayed):
dfs = [dfs]
dfs = [
delayed(df) if not isinstance(df, Delayed) and hasattr(df, "key") else df
for df in dfs
]
for df in dfs:
if not isinstance(df, Delayed):
raise TypeError("Expected Delayed object, got %s" % type(df).__name__)
> parent_meta = delayed(make_meta)(dfs[0]).compute()
E IndexError: list index out of range
/opt/conda/envs/arrow/lib/python3.7/site-packages/dask/dataframe/io/io.py:591: IndexError
{code}
(from https://github.com/ursacomputing/crossbow/runs/2756067090)
Not directly sure if this is a kartothek issue or a pyarrow issue. But also created an issue on their side: https://github.com/JDASoftwareGroup/kartothek/issues/475
--
This message was sent by Atlassian Jira
(v8.3.4#803005)