You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Tom-Newton (via GitHub)" <gi...@apache.org> on 2023/09/01 08:47:42 UTC

[GitHub] [arrow] Tom-Newton opened a new issue, #37511: Implement file reads for Azure filesystem

Tom-Newton opened a new issue, #37511:
URL: https://github.com/apache/arrow/issues/37511

   ### Describe the enhancement requested
   
   Read support probably requires an Azure implementation for `arrow::io::RandomAccessFile` then that can be used to implement the `OpenInputStream` and `OpenInputFile` methods of the `AzureFileSystem`.
   
   https://github.com/apache/arrow/pull/12914 implemented all of these features so this will be largely a case of just extracting the relevant parts from there. One modification I would suggest compared to that would be to avoid branching logic based on whether the Azure storage account has the hierarchical namespace enabled. Utilising features of the hierarchical namespace can make renames and listing tasks faster but for just reading blobs it shouldn't make any difference. 
   
   If we want to use features of the hierarchical namespace that adds some complexities:
   1. Makes things harder to test because its not supported by azurite https://github.com/Azure/Azurite/issues/553
   2. Its a bit difficult to query the storage account to determine if it supports hierarchical namespace. `ServiceClient::GetAccountInfo()` requires [Storage Account Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles#storage-blob-data-contributor) permissions (https://learn.microsoft.com/en-us/rest/api/storageservices/get-blob-service-properties?tabs=azure-ad#authorization) which is quite significantly elevated. Hadoop solves this by essentially calling `PathClient::GetAccessControlList()` and if it raises an exception hierarchical namespace is not supported https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L356-L385. 
   
   **Related Issues:**
   - https://github.com/apache/arrow/issues/18014 (is a child of)
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1718131125

   @Tom-Newton ok. This makes sense. Just make sure you make any work in progress you have as visible as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1718100571

   > > @Tom-Newton are you taking over the work on #12914?
   > 
   > I wouldn't say I'm taking over but I'm keen to push it along. So far @srilman and I have both merged PRs that implement a subset of what #12914 implemented.
   
   Thank you! Would you mind rebasing that PR to incorporate the work that has been done on the other PRs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Tom-Newton commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "Tom-Newton (via GitHub)" <gi...@apache.org>.
Tom-Newton commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1717940860

   > @Tom-Newton are you taking over the work on #12914?
   
   I wouldn't say I'm taking over but I'm keen to push it along. So far @srilman and I have both merged PRs that implement a subset of what https://github.com/apache/arrow/pull/12914 implemented. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Implement file reads for Azure filesystem [arrow]

Posted by "Tom-Newton (via GitHub)" <gi...@apache.org>.
Tom-Newton commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1764195952

   I think the PR is ready for review https://github.com/apache/arrow/pull/38269. 
   Hopefully what I've done makes sense, I'm still very inexperienced writing C++. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Tom-Newton commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "Tom-Newton (via GitHub)" <gi...@apache.org>.
Tom-Newton commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1718127511

   > Thank you! Would you mind rebasing that PR to incorporate the work that has been done on the other PRs?
   
   Plan was to keep merging small sections until it's feature complete. I think that's more likely to be done by extracting small sections from https://github.com/apache/arrow/pull/12914 rather than rebasing it and trying to merge it all in one go. This was the approach taken for the GCS filesystem and recommend by @kou.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Tom-Newton commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "Tom-Newton (via GitHub)" <gi...@apache.org>.
Tom-Newton commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1740738430

   I'm going to start working on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] Tom-Newton commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "Tom-Newton (via GitHub)" <gi...@apache.org>.
Tom-Newton commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1702403456

   cc @srilman since you mentioned you might be able to help out 
   If nobody else is able to pick it up I will probably start working on this within a couple of weeks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on issue #37511: [C++] Implement file reads for Azure filesystem

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on issue #37511:
URL: https://github.com/apache/arrow/issues/37511#issuecomment-1717807232

   @Tom-Newton are you taking over the work on #12914?
   
   cc @zeroshade @bkietz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [C++] Implement file reads for Azure filesystem [arrow]

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz closed issue #37511: [C++] Implement file reads for Azure filesystem
URL: https://github.com/apache/arrow/issues/37511


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org