You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Tom-Newton (via GitHub)" <gi...@apache.org> on 2023/05/15 17:52:28 UTC

[GitHub] [arrow] Tom-Newton commented on pull request #12914: ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage

Tom-Newton commented on PR #12914:
URL: https://github.com/apache/arrow/pull/12914#issuecomment-1548290926

   I've been able to give this a bit of a test. On average, I got an 11X performance improvement in my usecase compared to using the python `fsspec` implementation for Azure https://pypi.org/project/adlfs/. 
   
   I'm therefore very interested in this finding its way into an official arrow release. Is anyone currently working on it? @srilman have you made any progress on a skeleton PR? 
   
   Its possible I _might_ be able to help. I've never committed to `arrow` before and my knowledge of C++ is severely lacking but I have already spent a bit of time looking at this PR. I've succeeded in fixing the build and I also hacked together some of the python bindings so I could run my test. This makes me think I could at least be of some help. Additionally there is a small chance I could get one of my colleagues who actually understand C++ to help. 
   
   Steps I took to fix the build:
   1. There was a new release of https://www.zlib.net/ leading to https://zlib.net/zlib-1.2.12.tar.gz no longer existing.  That seems to have caused the Azure SDK build to fail. I resolved this by using a newer version of the Azure SDK.
   2. There were linker errors about `X509_CRL_load_http` relating to OpenSSL 1 vs OpenSSL 3. I believe this is a new problem after I updated the Azure SDK in the above step. I followed the instructions [here](https://azure.github.io/azure-sdk-for-cpp/#openssl-version) and was able to get this working. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org