You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/10/12 21:34:00 UTC

[jira] [Assigned] (ARROW-17069) [Python][R] GCSFIleSystem reports cannot resolve host on public buckets

     [ https://issues.apache.org/jira/browse/ARROW-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Jones reassigned ARROW-17069:
----------------------------------

    Assignee: Will Jones

> [Python][R] GCSFIleSystem reports cannot resolve host on public buckets
> -----------------------------------------------------------------------
>
>                 Key: ARROW-17069
>                 URL: https://issues.apache.org/jira/browse/ARROW-17069
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python, R
>    Affects Versions: 8.0.0
>            Reporter: Will Jones
>            Assignee: Will Jones
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 10.0.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> GCSFileSystem will returns {{Couldn't resolve host name}} if you don't supply {{anonymous}} as the user:
> {code:python}
> import pyarrow.dataset as ds
> # Fails:
> dataset = ds.dataset("gs://voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 749, in dataset
>     return _filesystem_dataset(source, **kwargs)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 441, in _filesystem_dataset
>     fs, paths_or_selector = _ensure_single_source(source, filesystem)
>   File "/Users/willjones/Documents/arrows/arrow/python/pyarrow/dataset.py", line 408, in _ensure_single_source
>     file_info = filesystem.get_file_info(path)
>   File "pyarrow/_fs.pyx", line 444, in pyarrow._fs.FileSystem.get_file_info
>     info = GetResultValue(self.fs.GetFileInfo(path))
>   File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
>     return check_status(status)
>   File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
>     raise IOError(message)
> OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted in GetObjectMetadata: EasyPerform() - CURL error [6]=Couldn't resolve host name)
> # This works fine:
> >>> dataset = ds.dataset("gs://anonymous@voltrondata-labs-datasets/nyc-taxi/?retry_limit_seconds=3")
> {code}
> I would expect that we could connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)