You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/10/12 21:34:00 UTC
[jira] [Assigned] (ARROW-17069) [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
[ https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Jones reassigned ARROW-17069:
----------------------------------
Assignee: Will Jones
> [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
> -----------------------------------------------------------------------
>
> Key: ARROW-17069
> URL: https://issues.apache.org/jira/browse/ARROW-17069
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python, R
> Affects Versions: 8.0.0
> Reporter: Will Jones
> Assignee: Will Jones
> Priority: Critical
> Labels: pull-request-available
> Fix For: 10.0.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply {{anonymous}} as the user:
> {code:python}
> import pyarrow.dataset as ds
> # Fails:
> dataset = ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 749, in dataset
> return _filesystem_dataset(source, **kwargs)
> File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 441, in _filesystem_dataset
> fs, paths_or_selector = _ensure_single_source(source, filesystem)
> File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 408, in _ensure_single_source
> file_info = filesystem.get_file_info(path)
> File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
> info = GetResultValue(self.fs.GetFileInfo(path))
> File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
> return check_status(status)
> File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
> raise IOError(message)
> OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)
> # This works fine:
> >>> dataset = ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> {code}
> I would expect that we could connect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)