You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/04/06 07:15:00 UTC

[jira] [Commented] (ARROW-8342) [Python] dask and kartothek integration tests are failing

    [ https://issues.apache.org/jira/browse/ARROW-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076082#comment-17076082 ] 

Joris Van den Bossche commented on ARROW-8342:
----------------------------------------------

So some things that are failing right now with the new KeyValueMetadata:

- You can no longer set values ({{metadata[b"key"] = ...}})
- The {{update()}} method of dict no longer exists
- {{write_to_dataset}} now writes parquet files with two "pandas" entries in the metadata (and one of them is wrong ..)
- The double entry is probably related to the fact that you can now create KeyValueMetadata with duplicate fields, while before this was not possible (or at least, before dict.update was used, which would overwrite existing keys with the passed ones).

All of the above can be solved (and probably solved rather easily). But since this are quite some aspects I now only quickly listed, there might be other corner cases that have changed. So I am wondering, given how close we are to the release, we should maybe rather revert the change for now, so we have a bit more time to let this settle after 0.17.

cc [~kszucs] [~apitrou] [~wesm]

> [Python] dask and kartothek integration tests are failing
> ---------------------------------------------------------
>
>                 Key: ARROW-8342
>                 URL: https://issues.apache.org/jira/browse/ARROW-8342
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Blocker
>             Fix For: 0.17.0
>
>
> The integration tests for both dask and kartothek, and for both master and latest released version of them, started failing the last days.
> Dask latest: https://circleci.com/gh/ursa-labs/crossbow/10629?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link 
> Kartothek latest: https://circleci.com/gh/ursa-labs/crossbow/10604?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link
> I think both are related to the KeyValueMetadata changes (ARROW-8079).
> The kartothek one is clearly related, as it gives: TypeError: 'pyarrow.lib.KeyValueMetadata' object does not support item assignment
> And I think the dask one is related to the "pandas" key now being present twice, and therefore it is using the "wrong" one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)