You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "KellyWalker (via GitHub)" <gi...@apache.org> on 2023/09/29 20:06:38 UTC
[GitHub] [arrow] KellyWalker opened a new issue, #37960: Add support for GCS URI (gs://) to pyarrow.parquet.read_table
KellyWalker opened a new issue, #37960:
URL: https://github.com/apache/arrow/issues/37960
### Describe the enhancement requested
Currently, this works:
```
from gcsfs import GCSFileSystem
import pyarrow.parquet as pq
gcs = GCSFileSystem()
parquet_file = pq.read_table("/bucket/path/to/file.parquet", filesystem=gcs)
```
But this does not:
```
import pyarrow.parquet as pq
parquet_file = pq.read_table("gs://bucket/path/to/file.parquet")
```
It would be nice if the latter worked directly without needing to specify the filesystem.
### Component(s)
Parquet, Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741521034
Which wheel are you using?
Wheel list: https://pypi.org/project/pyarrow/#files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] KellyWalker commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "KellyWalker (via GitHub)" <gi...@apache.org>.
KellyWalker commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741506624
Yes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] KellyWalker closed issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "KellyWalker (via GitHub)" <gi...@apache.org>.
KellyWalker closed issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
URL: https://github.com/apache/arrow/issues/37960
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] KellyWalker commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "KellyWalker (via GitHub)" <gi...@apache.org>.
KellyWalker commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741523600
pyarrow-12.0.1-cp38-cp38-win_amd64.whl
Listed here for 12.0.1: https://pypi.org/project/pyarrow/12.0.1/#files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741503538
Does this work?
```python
import pyarrow.dataset as ds
dataset = ds.dataset("gs://bucket/path/to/file.parquet", format="parquet")
parquet_file = dataset.to_table()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741524631
OK. pyarrow 13.0.0 will solve your problem.
See also: #35255/#35193
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741514555
How did you install your PyArrow? wheel? conda?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] KellyWalker commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "KellyWalker (via GitHub)" <gi...@apache.org>.
KellyWalker commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741508963
No.
In both cases the following error is generated:
`pyarrow.lib.ArrowNotImplementedError: Got GCS URI but Arrow compiled without GCS support`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] KellyWalker commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "KellyWalker (via GitHub)" <gi...@apache.org>.
KellyWalker commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741516297
Wheel file through poetry.
It is resolved to use pyarrow 12.0.1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] KellyWalker commented on issue #37960: [Python] Add support for GCS URI (gs://) to pyarrow.parquet.read_table
Posted by "KellyWalker (via GitHub)" <gi...@apache.org>.
KellyWalker commented on issue #37960:
URL: https://github.com/apache/arrow/issues/37960#issuecomment-1741540034
I confirmed that it does. Thank you so much for the help.
I did search for the issue before I reported it, but maybe I was only looking at open tickets.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org