You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/31 17:55:47 UTC

[GitHub] [arrow-cookbook] Nlte opened a new issue #56: [Python] Make pytest fails

Nlte opened a new issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56


   I'm getting 1 fail test when running the `make pytest` target.
   
   ```
   Document: io
   ------------
   **********************************************************************
   File "io.rst", line 799, in default
   Failed example:
       dataset = ds.dataset("s3://ursa-labs-taxi-data/2011",
                            partitioning=["month"])
       for f in dataset.files[:10]:
           print(f)
       print("...")
   Exception raised:
       Traceback (most recent call last):
         File "/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py", line 1336, in __run
           exec(compile(example.source, filename, "single",
         File "<doctest default[0]>", line 1, in <module>
           dataset = ds.dataset("s3://ursa-labs-taxi-data/2011",
         File "/Users/nathanael.leaute/Documents/github/arrow-cookbook/venv/lib/python3.9/site-packages/pyarrow/dataset.py", line 655, in dataset
           return _filesystem_dataset(source, **kwargs)
         File "/Users/nathanael.leaute/Documents/github/arrow-cookbook/venv/lib/python3.9/site-packages/pyarrow/dataset.py", line 410, in _filesystem_dataset
           return factory.finish(schema)
         File "pyarrow/_dataset.pyx", line 2402, in pyarrow._dataset.DatasetFactory.finish
         File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
         File "pyarrow/error.pxi", line 114, in pyarrow.lib.check_status
       OSError: Error creating dataset. Could not read schema from 'ursa-labs-taxi-data/2011/01/data.parquet': Could not open Parquet input source 'ursa-labs-taxi-data/2011/01/data.parquet': AWS Error [code 15]: Access Denied. Is this a 'parquet' file?
   **********************************************************************
   1 items had failures:
      1 of  27 in default
   27 tests in 1 items.
   26 passed and 1 failed.
   ***Test Failed*** 1 failures.
   
   ```
   
   It seems like the ACL on the ursa-labs-taxi-data bucket doesn't allow public access. I don't know if you want to open up the bucket / prefix to the public and incur that aws bandwidth costs though. Those are definitely a thing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] westonpace commented on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-909484752


   How to store largeish files is a good question.  Git submodules can be used or git lfs can be used but I don't really love either of them.  I don't think we'd want any examples that are too big in the cookbook repo and any examples that we do use should be downloadable from the cookbook site itself.  I suppose we could sort of bootstrap and host sample files on `arrow.apache.org/cookbook/test-data` too.
   
   @thisisnic @amol- @jorisvandenbossche thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] westonpace commented on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-953325204


   I believe this issue has been addressed.  I'm going to go ahead and close it but feel free to reopen if #70 wasn't sufficient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] Nlte edited a comment on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
Nlte edited a comment on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-914470885


   Ah my bad I've tried again without any aws configuration on my local laptop this time and test passes.
   It was pyarrow loading the aws credentials from env vars or ~/.aws/credentials and signing requests with those credentials that made it failed.
   But the bucket is indeed open to public.
   
   The only solution I see would be to pass an anonymous filesystem. This passes regardless or what's configured on the machine.
   ```
       from pyarrow import fs
       s3 = fs.S3FileSystem(region="us-east-2", anonymous=True)
   
       dataset = ds.dataset("ursa-labs-taxi-data/2011/",
                            partitioning=["month"], filesystem=s3)
   ```
   However it makes the simple example more complex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] westonpace closed issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
westonpace closed issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] westonpace commented on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-909484752


   How to store largeish files is a good question.  Git submodules can be used or git lfs can be used but I don't really love either of them.  I don't think we'd want any examples that are too big in the cookbook repo and any examples that we do use should be downloadable from the cookbook site itself.  I suppose we could sort of bootstrap and host sample files on `arrow.apache.org/cookbook/test-data` too.
   
   @thisisnic @amol- @jorisvandenbossche thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] amol- commented on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
amol- commented on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-911549944


   That's strange, the test passes fine for me. I used that bucket explicitly because it was the one we mention in other tests too and is meant to be public as far as I know
   
   ```
   Document: io
   ------------
   1 items passed all tests:
     31 tests in default
   31 tests in 1 items.
   31 passed and 0 failed.
   Test passed.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] Nlte edited a comment on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
Nlte edited a comment on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-914470885


   Ah my bad I've tried again without any aws configuration on my local laptop this time and test passes.
   It was pyarrow loading the aws credentials from env vars or ~/.aws/credentials and signing requests with those credentials that made it failed.
   But the bucket is indeed open to public.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-cookbook] Nlte commented on issue #56: [Python] Makefile: pytest target fails

Posted by GitBox <gi...@apache.org>.
Nlte commented on issue #56:
URL: https://github.com/apache/arrow-cookbook/issues/56#issuecomment-914470885


   Ah my bad I've tried again but without any aws configuration on my local laptop this time and test passes.
   It was pyarrow loading the aws credentials from env vars or ~/.aws/credentials and signing requests with those credentials that made it failed.
   But the bucket is indeed open to public.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org