You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "vmuddassir-msft (via GitHub)" <gi...@apache.org> on 2023/07/26 20:34:54 UTC

[GitHub] [arrow-rs] vmuddassir-msft opened a new pull request, #4573: Changes required for onelake-fix issue#1418

vmuddassir-msft opened a new pull request, #4573:
URL: https://github.com/apache/arrow-rs/pull/4573

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on pull request #4573: Add Support for Microsoft Fabric / OneLake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1683333102

   @tustvold - Any update on the release for changes to object_store crate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1288937026


##########
object_store/src/azure/mod.rs:
##########
@@ -911,7 +928,13 @@ impl MicrosoftAzureBuilder {
             (true, url, credential, account_name)
         } else {
             let account_name = self.account_name.ok_or(Error::MissingAccount {})?;
-            let account_url = format!("https://{}.blob.core.windows.net", &account_name);
+            let account_url = if account_name.contains("onelake") {

Review Comment:
   @tustvold You can create an account on app.fabric.com. Once you have an account, you can follow the steps mentioned in https://learn.microsoft.com/en-us/fabric/onelake/create-lakehouse-onelake to create a lakehouse and select files to load data.
   
   Clicking on Properties for a file used in the data load will display the Fabric URL and Azure Blob File System (ABFS) path. I have added a unit test[ object_store/tests/azure_object_store.rs](https://github.com/apache/arrow-rs/pull/4573/files#diff-aab9873dcec4984aee119db66108eb45b50b7d03f709913be92dcaf13bd460d8), which uses the OneLake URL scheme for read/write operations.
   
   ![image](https://github.com/apache/arrow-rs/assets/140655500/dcaa6825-b70c-4ee7-b3ca-d4945644441f)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4573: Add Support for Microsoft Fabric / OneLake

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290842815


##########
object_store/tests/azure_object_store.rs:
##########
@@ -0,0 +1,114 @@
+use async_trait::async_trait;
+use bytes::Bytes;
+use futures::stream::BoxStream;
+use object_store::azure::{MicrosoftAzureBuilder, MicrosoftAzure};
+use object_store::path::Path;
+use object_store::{
+    GetOptions, GetResult, ListResult, MultipartId, ObjectMeta, ObjectStore,
+};
+use std::fmt::Formatter;
+use tokio::io::AsyncWrite;
+
+#[derive(Debug)]
+struct AzureStore(MicrosoftAzure);
+
+impl std::fmt::Display for AzureStore {
+    fn fmt(&self, _: &mut Formatter<'_>) -> std::fmt::Result {
+        todo!()
+    }
+}
+
+#[async_trait]
+impl ObjectStore for AzureStore {
+    async fn put(&self, path: &Path, data: Bytes) -> object_store::Result<()> {
+        self.0.put(path, data).await
+    }
+
+    async fn put_multipart(
+        &self,
+        _: &Path,
+    ) -> object_store::Result<(MultipartId, Box<dyn AsyncWrite + Unpin + Send>)> {
+        todo!()
+    }
+
+    async fn abort_multipart(
+        &self,
+        _: &Path,
+        _: &MultipartId,
+    ) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn get_opts(
+        &self,
+        location: &Path,
+        options: GetOptions,
+    ) -> object_store::Result<GetResult> {
+        self.0.get_opts(location, options).await
+    }
+
+    async fn head(&self, _: &Path) -> object_store::Result<ObjectMeta> {
+        todo!()
+    }
+
+    async fn delete(&self, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn list(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<BoxStream<'_, object_store::Result<ObjectMeta>>> {
+        todo!()
+    }
+
+    async fn list_with_delimiter(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<ListResult> {
+        todo!()
+    }
+
+    async fn copy(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn copy_if_not_exists(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+}
+
+
+#[tokio::test]
+async fn test_fabric() {        
+    //Format:https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>/<itemGUID>/Files/test.csv
+    //Example:https://onelake.dfs.fabric.microsoft.com/86bc63cf-5086-42e0-b16d-6bc580d1dc87/17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv
+    //Account Name : onelake
+    //Container Name : workspaceGUID
+
+    let daily_store = AzureStore(
+        MicrosoftAzureBuilder::new()
+        .with_container_name("86bc63cf-5086-42e0-b16d-6bc580d1dc87")
+        .with_account("onelake")
+        .with_bearer_token_authorization("jwt-token")
+        .build()
+        .unwrap());
+    
+    let path = Path::from("17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv");

Review Comment:
   Correct  ,when Performing a List on 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/,  the ObjectMeta returned will contain the full path 
    ObjectMeta { location: Path { raw: "17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv" }, last_modified: 2023-08-10T18:31:52Z, size: 11, e_tag: Some("0x8DB99D01129EA9D") }
   
   <img width="809" alt="image" src="https://github.com/apache/arrow-rs/assets/140655500/d7c0a3fc-6e09-444d-abd1-57bfc01a3684">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4573: Add Support for Microsoft Fabric / OneLake

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1683428966

   The RC was cut on Tuesday, Apache mandates at least a 3 day voting period, and so the release should be published today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4573: Add Support for Microsoft Fabric / OneLake

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1677086506

   I intend to cut a RC today or tomorrow, with a view to publishing a release later this week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1289143355


##########
object_store/tests/azure_object_store.rs:
##########
@@ -0,0 +1,114 @@
+use async_trait::async_trait;

Review Comment:
   I think we should probably remove this before merge, but thank you for creating it to show how you are expecting to interact with this system



##########
object_store/tests/azure_object_store.rs:
##########
@@ -0,0 +1,114 @@
+use async_trait::async_trait;
+use bytes::Bytes;
+use futures::stream::BoxStream;
+use object_store::azure::{MicrosoftAzureBuilder, MicrosoftAzure};
+use object_store::path::Path;
+use object_store::{
+    GetOptions, GetResult, ListResult, MultipartId, ObjectMeta, ObjectStore,
+};
+use std::fmt::Formatter;
+use tokio::io::AsyncWrite;
+
+#[derive(Debug)]
+struct AzureStore(MicrosoftAzure);
+
+impl std::fmt::Display for AzureStore {
+    fn fmt(&self, _: &mut Formatter<'_>) -> std::fmt::Result {
+        todo!()
+    }
+}
+
+#[async_trait]
+impl ObjectStore for AzureStore {
+    async fn put(&self, path: &Path, data: Bytes) -> object_store::Result<()> {
+        self.0.put(path, data).await
+    }
+
+    async fn put_multipart(
+        &self,
+        _: &Path,
+    ) -> object_store::Result<(MultipartId, Box<dyn AsyncWrite + Unpin + Send>)> {
+        todo!()
+    }
+
+    async fn abort_multipart(
+        &self,
+        _: &Path,
+        _: &MultipartId,
+    ) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn get_opts(
+        &self,
+        location: &Path,
+        options: GetOptions,
+    ) -> object_store::Result<GetResult> {
+        self.0.get_opts(location, options).await
+    }
+
+    async fn head(&self, _: &Path) -> object_store::Result<ObjectMeta> {
+        todo!()
+    }
+
+    async fn delete(&self, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn list(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<BoxStream<'_, object_store::Result<ObjectMeta>>> {
+        todo!()
+    }
+
+    async fn list_with_delimiter(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<ListResult> {
+        todo!()
+    }
+
+    async fn copy(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn copy_if_not_exists(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+}
+
+
+#[tokio::test]
+async fn test_fabric() {        
+    //Format:https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>/<itemGUID>/Files/test.csv
+    //Example:https://onelake.dfs.fabric.microsoft.com/86bc63cf-5086-42e0-b16d-6bc580d1dc87/17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv
+    //Account Name : onelake
+    //Container Name : workspaceGUID
+
+    let daily_store = AzureStore(
+        MicrosoftAzureBuilder::new()
+        .with_container_name("86bc63cf-5086-42e0-b16d-6bc580d1dc87")
+        .with_account("onelake")
+        .with_bearer_token_authorization("jwt-token")
+        .build()
+        .unwrap());
+    
+    let path = Path::from("17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv");

Review Comment:
   This is the bit that I found rather surprising, I would have naively thought that the corollary for a container would be the files within an item, i.e. the user would just specify `SampleCustomerList.csv` in the path, with everything else provided at construction time of the builder.
   
   This does appear to be what the docs suggest, so :shrug: To confirm if I were to perform a List of `17d3977c-d46e-4bae-8fed-ff467e674aed/Files/` it would return the full path, i.e. `17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv` and not `SampleCustomerList.csv`?



##########
object_store/src/azure/mod.rs:
##########
@@ -911,7 +928,13 @@ impl MicrosoftAzureBuilder {
             (true, url, credential, account_name)
         } else {
             let account_name = self.account_name.ok_or(Error::MissingAccount {})?;
-            let account_url = format!("https://{}.blob.core.windows.net", &account_name);
+            let account_url = if account_name.contains("onelake") {

Review Comment:
   What happens if a user has a regular storage account called onelake or containing within its name onelake, this will break it. I think we probably need some sort of `with_use_fabric_endpoint`, to explicitly gate this. This could then be set appropriately on URL extraction. What do you think?



##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   What is the motivation for this change, AFAICT a container name is necessary to specify the workspace name?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   I removed this check. The original intention was to skip the container name check if it was set using the URL scheme "onelake.dfs.fabric.microsoft.com/workspaceGUID". This behavior has now been addressed in the '[parse_url](https://github.com/apache/arrow-rs/pull/4573/files#diff-36f8b096521bf44c962a023c2914a4725d58cceac21d818229be2366e4c890f8R760)' function 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4573: fix: add support for microsoft onelake

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1674745053

   I took the liberty of pushing some minor changes to prepare this for merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1288937026


##########
object_store/src/azure/mod.rs:
##########
@@ -911,7 +928,13 @@ impl MicrosoftAzureBuilder {
             (true, url, credential, account_name)
         } else {
             let account_name = self.account_name.ok_or(Error::MissingAccount {})?;
-            let account_url = format!("https://{}.blob.core.windows.net", &account_name);
+            let account_url = if account_name.contains("onelake") {

Review Comment:
   @tustvold You can create an account on app.fabric.com. Once you have an account, you can follow the steps mentioned in https://learn.microsoft.com/en-us/fabric/onelake/create-lakehouse-onelake to create a lakehouse and select files to load data.
   
   Clicking on Properties for a file used in the data load will display the Fabric URL and Azure Blob File System (ABFS) path. I have added a unit test object_store/tests/azure_object_store.rs, which uses the OneLake URL scheme for read/write operations.
   
   ![image](https://github.com/apache/arrow-rs/assets/140655500/dcaa6825-b70c-4ee7-b3ca-d4945644441f)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   I removed this check. The original intention was to skip the container name check if it was set using the URL scheme "onelake.dfs.fabric.microsoft.com/workspaceGUID". This behavior has now been addressed in the 'parse_url' function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290842815


##########
object_store/tests/azure_object_store.rs:
##########
@@ -0,0 +1,114 @@
+use async_trait::async_trait;
+use bytes::Bytes;
+use futures::stream::BoxStream;
+use object_store::azure::{MicrosoftAzureBuilder, MicrosoftAzure};
+use object_store::path::Path;
+use object_store::{
+    GetOptions, GetResult, ListResult, MultipartId, ObjectMeta, ObjectStore,
+};
+use std::fmt::Formatter;
+use tokio::io::AsyncWrite;
+
+#[derive(Debug)]
+struct AzureStore(MicrosoftAzure);
+
+impl std::fmt::Display for AzureStore {
+    fn fmt(&self, _: &mut Formatter<'_>) -> std::fmt::Result {
+        todo!()
+    }
+}
+
+#[async_trait]
+impl ObjectStore for AzureStore {
+    async fn put(&self, path: &Path, data: Bytes) -> object_store::Result<()> {
+        self.0.put(path, data).await
+    }
+
+    async fn put_multipart(
+        &self,
+        _: &Path,
+    ) -> object_store::Result<(MultipartId, Box<dyn AsyncWrite + Unpin + Send>)> {
+        todo!()
+    }
+
+    async fn abort_multipart(
+        &self,
+        _: &Path,
+        _: &MultipartId,
+    ) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn get_opts(
+        &self,
+        location: &Path,
+        options: GetOptions,
+    ) -> object_store::Result<GetResult> {
+        self.0.get_opts(location, options).await
+    }
+
+    async fn head(&self, _: &Path) -> object_store::Result<ObjectMeta> {
+        todo!()
+    }
+
+    async fn delete(&self, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn list(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<BoxStream<'_, object_store::Result<ObjectMeta>>> {
+        todo!()
+    }
+
+    async fn list_with_delimiter(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<ListResult> {
+        todo!()
+    }
+
+    async fn copy(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn copy_if_not_exists(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+}
+
+
+#[tokio::test]
+async fn test_fabric() {        
+    //Format:https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>/<itemGUID>/Files/test.csv
+    //Example:https://onelake.dfs.fabric.microsoft.com/86bc63cf-5086-42e0-b16d-6bc580d1dc87/17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv
+    //Account Name : onelake
+    //Container Name : workspaceGUID
+
+    let daily_store = AzureStore(
+        MicrosoftAzureBuilder::new()
+        .with_container_name("86bc63cf-5086-42e0-b16d-6bc580d1dc87")
+        .with_account("onelake")
+        .with_bearer_token_authorization("jwt-token")
+        .build()
+        .unwrap());
+    
+    let path = Path::from("17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv");

Review Comment:
   Correct  , the ObjectMeta will contain the full path 
    ObjectMeta { location: Path { raw: "17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv" }, last_modified: 2023-08-10T18:31:52Z, size: 11, e_tag: Some("0x8DB99D01129EA9D") }
   
   <img width="809" alt="image" src="https://github.com/apache/arrow-rs/assets/140655500/d7c0a3fc-6e09-444d-abd1-57bfc01a3684">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   I removed this check. The original intention was to skip the container name check if it was set using the URL scheme "onelake.dfs.fabric.microsoft.com/<workspaceGUID>". This behavior has now been addressed in the 'parse_url' function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290543524


##########
object_store/src/azure/mod.rs:
##########
@@ -911,7 +928,13 @@ impl MicrosoftAzureBuilder {
             (true, url, credential, account_name)
         } else {
             let account_name = self.account_name.ok_or(Error::MissingAccount {})?;
-            let account_url = format!("https://{}.blob.core.windows.net", &account_name);
+            let account_url = if account_name.contains("onelake") {

Review Comment:
   Thanks for pointing this out, I have added with_use_fabric to check if the fabric url scheme needs to be set



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on pull request #4573: Add Support for Microsoft Fabric / OneLake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1677081672

   @tustvold  - When will these changes to the crate be published? I noticed that the last build was released on June 6th  https://crates.io/crates/object_store/versions
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1653727975

   added additional onelake urls to object_store/src/azure/mod.rs for both https and abfss
   
   this is required to fix Issue [1418](https://github.com/delta-io/delta-rs/issues/1418)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   Removed this check , the intention was if to skip container name check if this was set using the url https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>,this has now been handled in parse_url



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290842815


##########
object_store/tests/azure_object_store.rs:
##########
@@ -0,0 +1,114 @@
+use async_trait::async_trait;
+use bytes::Bytes;
+use futures::stream::BoxStream;
+use object_store::azure::{MicrosoftAzureBuilder, MicrosoftAzure};
+use object_store::path::Path;
+use object_store::{
+    GetOptions, GetResult, ListResult, MultipartId, ObjectMeta, ObjectStore,
+};
+use std::fmt::Formatter;
+use tokio::io::AsyncWrite;
+
+#[derive(Debug)]
+struct AzureStore(MicrosoftAzure);
+
+impl std::fmt::Display for AzureStore {
+    fn fmt(&self, _: &mut Formatter<'_>) -> std::fmt::Result {
+        todo!()
+    }
+}
+
+#[async_trait]
+impl ObjectStore for AzureStore {
+    async fn put(&self, path: &Path, data: Bytes) -> object_store::Result<()> {
+        self.0.put(path, data).await
+    }
+
+    async fn put_multipart(
+        &self,
+        _: &Path,
+    ) -> object_store::Result<(MultipartId, Box<dyn AsyncWrite + Unpin + Send>)> {
+        todo!()
+    }
+
+    async fn abort_multipart(
+        &self,
+        _: &Path,
+        _: &MultipartId,
+    ) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn get_opts(
+        &self,
+        location: &Path,
+        options: GetOptions,
+    ) -> object_store::Result<GetResult> {
+        self.0.get_opts(location, options).await
+    }
+
+    async fn head(&self, _: &Path) -> object_store::Result<ObjectMeta> {
+        todo!()
+    }
+
+    async fn delete(&self, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn list(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<BoxStream<'_, object_store::Result<ObjectMeta>>> {
+        todo!()
+    }
+
+    async fn list_with_delimiter(
+        &self,
+        _: Option<&Path>,
+    ) -> object_store::Result<ListResult> {
+        todo!()
+    }
+
+    async fn copy(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+
+    async fn copy_if_not_exists(&self, _: &Path, _: &Path) -> object_store::Result<()> {
+        todo!()
+    }
+}
+
+
+#[tokio::test]
+async fn test_fabric() {        
+    //Format:https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>/<itemGUID>/Files/test.csv
+    //Example:https://onelake.dfs.fabric.microsoft.com/86bc63cf-5086-42e0-b16d-6bc580d1dc87/17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv
+    //Account Name : onelake
+    //Container Name : workspaceGUID
+
+    let daily_store = AzureStore(
+        MicrosoftAzureBuilder::new()
+        .with_container_name("86bc63cf-5086-42e0-b16d-6bc580d1dc87")
+        .with_account("onelake")
+        .with_bearer_token_authorization("jwt-token")
+        .build()
+        .unwrap());
+    
+    let path = Path::from("17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv");

Review Comment:
   Correct  ,when Performing a List on 17d3977c-d46e-4bae-8fed-ff467e674aed/Files/,  ObjectMeta returned will contain the full path 
    ObjectMeta { location: Path { raw: "17d3977c-d46e-4bae-8fed-ff467e674aed/Files/SampleCustomerList.csv" }, last_modified: 2023-08-10T18:31:52Z, size: 11, e_tag: Some("0x8DB99D01129EA9D") }
   
   <img width="809" alt="image" src="https://github.com/apache/arrow-rs/assets/140655500/d7c0a3fc-6e09-444d-abd1-57bfc01a3684">
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1669331523

   Unit tests added to verify url schemes for onelake fabric url's  
   
   https://account.blob.fabric.microsoft.com/
   https://account.dfs.fabric.microsoft.com/
   abfs://account.dsf.fabric.microsoft.com/
   
   Link to [Documentation](https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on pull request #4573: Add Support for Microsoft Fabric / OneLake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#issuecomment-1677108273

   > I intend to cut a RC today or tomorrow, with a view to publishing a release later this week
   
   Thanks - I have another PR on delta-rs that needs to be verified with the latest changes to object store crate dependency referenced in delta-rs, This is required to fix https://github.com/delta-io/delta-rs/issues/1418.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1287393293


##########
object_store/src/azure/mod.rs:
##########
@@ -911,7 +928,13 @@ impl MicrosoftAzureBuilder {
             (true, url, credential, account_name)
         } else {
             let account_name = self.account_name.ok_or(Error::MissingAccount {})?;
-            let account_url = format!("https://{}.blob.core.windows.net", &account_name);
+            let account_url = if account_name.contains("onelake") {

Review Comment:
   This feels rather fragile...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   I removed this check. The original intention was to skip the container name check if it was set using the URL scheme "https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>". This behavior has now been addressed in the 'parse_url' function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   I removed this check. The intention was to skip the container name check if this was set using the URL scheme "onelake.dfs.fabric.microsoft.com/workspaceGUID". This behavior has now been addressed in the '[parse_url](https://github.com/apache/arrow-rs/pull/4573/files#diff-36f8b096521bf44c962a023c2914a4725d58cceac21d818229be2366e4c890f8R760)' function 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] vmuddassir-msft commented on a diff in pull request #4573: fix: add support for microsoft onelake

Posted by "vmuddassir-msft (via GitHub)" <gi...@apache.org>.
vmuddassir-msft commented on code in PR #4573:
URL: https://github.com/apache/arrow-rs/pull/4573#discussion_r1290551821


##########
object_store/src/azure/mod.rs:
##########
@@ -889,7 +896,17 @@ impl MicrosoftAzureBuilder {
             self.parse_url(&url)?;
         }
 
-        let container = self.container_name.ok_or(Error::MissingContainerName {})?;
+        let use_ok_or = match &self.account_name {
+            Some(account_name) => !account_name.contains("onelake"),
+            None => true,
+        };
+
+        let container = if use_ok_or {
+            self.container_name.ok_or(Error::MissingContainerName {})?
+        } else {
+            self.container_name.unwrap_or_default()
+        };

Review Comment:
   I removed this check. The original intention was to skip the container name check if it was set using the URL scheme 'https://onelake.dfs.fabric.microsoft.com/<workspaceGUID>'. This behavior has now been addressed in the 'parse_url' function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org