You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "collimarco (via GitHub)" <gi...@apache.org> on 2023/06/01 18:12:40 UTC

[GitHub] [arrow] collimarco opened a new issue, #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

collimarco opened a new issue, #35877:
URL: https://github.com/apache/arrow/issues/35877

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   I would like to read a file from Cloudflare R2. This is the code:
   
   ```ruby
   require 'arrow'
   require 'parquet'
   require 'arrow-dataset'
   
   s3_uri = URI("s3://accesskey:secretkey@abc123.r2.cloudflarestorage.com/bucket-name/sample.parquet")
   table = Arrow::Table.load(s3_uri, format: :parquet)
   ```
   
   You need to replace the URI with the actual values for a bucket. I have checked the values several times and they are correct in my case.
   
   The problem is that the apache arrow library raises this error:
   
   ```
   /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/gobject-introspection-4.1.6/lib/gobject-introspection/loader.rb:705:in `invoke': [file-system-dataset-factory][set-file-system-uri]: IOError: Bucket 'abc123.r2.cloudflarestorage.com' not found (Arrow::Error::Io)
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/gobject-introspection-4.1.6/lib/gobject-introspection/loader.rb:705:in `invoke'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/gobject-introspection-4.1.6/lib/gobject-introspection/loader.rb:573:in `set_file_system_uri'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-dataset-12.0.0/lib/arrow-dataset/file-system-dataset-factory.rb:35:in `set_file_system_uri'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-dataset-12.0.0/lib/arrow-dataset/arrow-table-loadable.rb:42:in `block in internal_load_from_uri'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-dataset-12.0.0/lib/arrow-dataset/dataset.rb:24:in `build'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-dataset-12.0.0/lib/arrow-dataset/arrow-table-loadable.rb:41:in `internal_load_from_uri'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-dataset-12.0.0/lib/arrow-dataset/arrow-table-loadable.rb:35:in `load_from_uri'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-12.0.0/lib/arrow/table-loader.rb:49:in `block in load'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-12.0.0/lib/arrow/table-loader.rb:47:in `each'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-12.0.0/lib/arrow/table-loader.rb:47:in `load'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-12.0.0/lib/arrow/table-loader.rb:24:in `load'
   	from /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/red-arrow-12.0.0/lib/arrow/table.rb:30:in `load'
   	from searchparquetr2.rb:6:in `<main>'
   /tmp/apache-arrow-20230508-11129-1jbn0fn/apache-arrow-12.0.0/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
   ```
   
   This must be a bug because the bucket with that host name actually exists.
   
   It should be possible to reproduce this by creating a free R2 bucket on Cloudflare (not public) and then try to run the above code.
   
   ### Component(s)
   
   Ruby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] collimarco commented on issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "collimarco (via GitHub)" <gi...@apache.org>.
collimarco commented on issue #35877:
URL: https://github.com/apache/arrow/issues/35877#issuecomment-1575693669

   @kou If I call it explicitly (at the end of the Ruby program reported above) I get another error:
   
   ```
   in `<main>': uninitialized constant Arrow::S3 (NameError)
   
   Arrow::S3.finalize
        ^^^^
   /tmp/apache-arrow-20230508-11129-1jbn0fn/apache-arrow-12.0.0/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] collimarco commented on issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "collimarco (via GitHub)" <gi...@apache.org>.
collimarco commented on issue #35877:
URL: https://github.com/apache/arrow/issues/35877#issuecomment-1573423856

   @kou Thank you! It works!
   
   The only issue that still remains is that the above program doesn't terminate correctly (never stops running) and displays this error:
   
   ```
   /tmp/apache-arrow-20230508-11129-1jbn0fn/apache-arrow-12.0.0/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35877:
URL: https://github.com/apache/arrow/issues/35877#issuecomment-1575727280

   Ah, sorry. It was `Arrow.s3_finalize`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35877:
URL: https://github.com/apache/arrow/issues/35877#issuecomment-1572828828

   I think that `s3://accesskey:secretkey@bucket-name/sample.parquet?endpoint_override=abc123.r2.cloudflarestorage.com` or something is the correct URL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35877:
URL: https://github.com/apache/arrow/issues/35877#issuecomment-1575193614

   Could you call `Arrow::S3.finalize` explicitly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] collimarco commented on issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "collimarco (via GitHub)" <gi...@apache.org>.
collimarco commented on issue #35877:
URL: https://github.com/apache/arrow/issues/35877#issuecomment-1576447970

   @kou Thanks, it works now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] collimarco closed issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2

Posted by "collimarco (via GitHub)" <gi...@apache.org>.
collimarco closed issue #35877: [Ruby] Cannot read Parquet file from Cloudflare R2
URL: https://github.com/apache/arrow/issues/35877


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org