You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/18 09:38:25 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #7474: ARROW-8802: [C++][Dataset] Preserve dataset schema's metadata on column projection

jorisvandenbossche commented on pull request #7474:
URL: https://github.com/apache/arrow/pull/7474#issuecomment-645904783


   So this was actually a "time bomb" in the extension type testing code: we were using a test extension type with the same name than an extension type that in the meantime has been implemented in pandas. So depending on whether pandas already registered this type with pyarrow or not, would trigger this error "A type extension with name pandas.period already defined". 
   
   And using the pandas parquet functionality, triggers pandas to register this extension type. So until now, when running the tests in normal order, no test has been using pandas' parquet functions, before testing `test_extension_type.py`. But with this PR, a test using pandas' `to_parquet` was dded to `test_dataset.py`, which is run before `test_extension_type.py` ..
   
   If I explicitly tell pytest to run eg ``test_parquetpy`` first and then `test_extension_type.py`, I get the exact same failure on master.
   
   TLDR: pushed a fix to rename our test extension type class ;)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org