You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Adam Kirby (Jira)" <ji...@apache.org> on 2022/07/21 20:56:00 UTC
[jira] [Updated] (ARROW-17174) FileSystemDataset FilenamePartitioning error - fsspec filesystem
[ https://issues.apache.org/jira/browse/ARROW-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Kirby updated ARROW-17174:
-------------------------------
Affects Version/s: 8.0.0
(was: 8.0.1)
> FileSystemDataset FilenamePartitioning error - fsspec filesystem
> ----------------------------------------------------------------
>
> Key: ARROW-17174
> URL: https://issues.apache.org/jira/browse/ARROW-17174
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 8.0.0
> Reporter: Adam Kirby
> Priority: Major
> Attachments: zip_of_csvs_test.py
>
>
> Unless this is user error (which it may well be!), it seems that Dataset FilenamePartitioning on read doesn't seem to work with an fsspec filesystem. From what I can glean, the filenames can be parsed successfully when passed to the parse() method, but do not seem to be being extracted as fields from the filenames passed to dataset() – instead, they appear as nulls. When trying to use the partitioning discover() method (assuming this is a reasonable thing to try), I get the below traceback. (Repro python script attached).
> Traceback (most recent call last):
> File "/zip_of_csvs_test.py", line 82, in <module>
> ds_partitioned = pds.dataset(
> File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 697, in dataset
> return _filesystem_dataset(source, **kwargs)
> File "/.pyenv/versions/3.8.2/lib/python3.8/site-packages/pyarrow/dataset.py", line 449, in _filesystem_dataset
> return factory.finish(schema)
> File "pyarrow/_dataset.pyx", line 1857, in pyarrow._dataset.DatasetFactory.finish
> File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
> File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: No non-null segments were available for field 'frequency'; couldn't infer type
--
This message was sent by Atlassian Jira
(v8.20.10#820010)