You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/08 16:11:03 UTC
[GitHub] [arrow] lidavidm opened a new pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
lidavidm opened a new pull request #10483:
URL: https://github.com/apache/arrow/pull/10483
This adds a bit more context to the error messages, though maybe this is a bit wordy?
```
>>> ds.dataset('dataset4', format="ipc")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lidavidm/Code/upstream/arrow-12827/python/pyarrow/dataset.py", line 655, in dataset
return _filesystem_dataset(source, **kwargs)
File "/home/lidavidm/Code/upstream/arrow-12827/python/pyarrow/dataset.py", line 410, in _filesystem_dataset
return factory.finish(schema)
File "pyarrow/_dataset.pyx", line 2262, in pyarrow._dataset.DatasetFactory.finish
return Dataset.wrap(GetResultValue(result))
File "pyarrow/error.pxi", line 141, in pyarrow.lib.pyarrow_internal_check_status
return check_status(status)
File "pyarrow/error.pxi", line 97, in pyarrow.lib.check_status
raise ArrowInvalid(message)
pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from 'dataset4/foo.parquet': Could not open IPC input source 'dataset4/foo.parquet': File is too small: 9. Is this a 'ipc' file?
>>> ds.dataset('dataset5', format="parquet")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lidavidm/Code/upstream/arrow-12827/python/pyarrow/dataset.py", line 655, in dataset
return _filesystem_dataset(source, **kwargs)
File "/home/lidavidm/Code/upstream/arrow-12827/python/pyarrow/dataset.py", line 410, in _filesystem_dataset
return factory.finish(schema)
File "pyarrow/_dataset.pyx", line 2262, in pyarrow._dataset.DatasetFactory.finish
return Dataset.wrap(GetResultValue(result))
File "pyarrow/error.pxi", line 141, in pyarrow.lib.pyarrow_internal_check_status
return check_status(status)
File "pyarrow/error.pxi", line 112, in pyarrow.lib.check_status
raise IOError(message)
OSError: Error creating dataset. Could not read schema from 'dataset5/foo.parquet': Could not open Parquet input source 'dataset5/foo.parquet': Invalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-861682228
Is everyone happy with the error message here? :slightly_smiling_face:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-857688984
Alright, I added back the 'Is this a XYZ file' message.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-857686569
I think the "Is this a XYZ file?" conveys the information quite clearly (and invites the user to check).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-856906270
https://issues.apache.org/jira/browse/ARROW-12827
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm closed pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #10483:
URL: https://github.com/apache/arrow/pull/10483
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] lidavidm commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-857649332
This is even more wordy, but perhaps `If reading a different format than 'Parquet', pass the intended format to the dataset/factory`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-857614846
I think the "Is this a XYZ file?" is actually quite useful, and not too verbose. Because it's something easy to get when reading a non-parquet file and you forget to specify the format (the default is "parquet", and not to infer it from the file, as users might expect)
This relates to a PR @thisisnic did for improving this error message on the R side -> https://github.com/apache/arrow/pull/10326 (this PR might cover the custom handling you added in R?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on pull request #10483: ARROW-12827: [C++] Improve error message for dataset discovery failure
Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #10483:
URL: https://github.com/apache/arrow/pull/10483#issuecomment-857617366
> I think the "Is this a XYZ file?" is actually quite useful, and not too verbose
Of course the "Could not open Parquet input source" part also already gives a hint for that
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org