You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/02/02 23:57:28 UTC

[GitHub] [incubator-pinot] rkanumul opened a new pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

rkanumul opened a new pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531


   ## Description
   Linkedin plans to use ADLSGen2PinotFS but requires
   1) Service principal based authentication to do that for better ACL control. The current accesskey based auth would be harder to maintain with multiple applications.   https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-java#connect-by-using-azure-active-directory-azure-ad
   2) The current implementation fails in case the blob container(file system) is not created beforehand. This adds additional friction when on-boarding new use cases or standing up pinot on a new environment.
   3) Introduces unit tests to ADLSGen2PinotFS to improve craftsmanship. 
   
   ## Upgrade Notes
   Improves ADLSGen2PinotFS with service principal based auth, auto create container on initial run.
   Is backwards compatible with key based auth. But also optionally understands clientId, tenantId, clientSecret configs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573430961



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul If Gen1 code automatically creates the top-level blob container, I think that we should change that. I still think that our code should not create the top-level container and we should assume that they are pre-generated. Pinot should be able to generate any files/directories within the container. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572494741



##########
File path: pom.xml
##########
@@ -142,7 +142,7 @@
     <dropwizard-metrics.version>4.1.2</dropwizard-metrics.version>
     <snappy-java.version>1.1.1.7</snappy-java.version>
     <log4j.version>2.11.2</log4j.version>
-    <netty.version>4.1.42.Final</netty.version>
+    <netty.version>4.1.54.Final</netty.version>

Review comment:
       No, can't do that. I tried to and failed.
   I would say that is what the travis checks are for.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572267855



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/AzurePinotFSTest.java
##########
@@ -74,8 +72,6 @@ public void testFS()
     Assert.assertTrue(azurePinotFS.exists(new URI(_adlLocation)));
 
     File file = new File(_adlLocation, "testfile2");
-    MockADLFileInputStream adlFileInputStream =

Review comment:
       Can you check the usage of `MockADLFileInputStream` ? If there's no usage, we can remove this class.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       I think that we should throw the exception instead of creating the file system in Azure. We can assume that the blob container is created beforehand because this file system may need to be configured differently. Auto-managing containers will be handy if we need to use multiple containers but in our case, we just need a single container with ADLS Gen2 feature enabled.
   
   Please double check Google Cloud Storage and AWS S3 implementation. We should keep them in sync.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io commented on pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#issuecomment-773109022


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=h1) Report
   > Merging [#6531](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=desc) (b864bd8) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `1.49%`.
   > The diff coverage is `59.94%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6531/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6531      +/-   ##
   ==========================================
   - Coverage   66.44%   64.95%   -1.50%     
   ==========================================
     Files        1075     1334     +259     
     Lines       54773    65805   +11032     
     Branches     8168     9608    +1440     
   ==========================================
   + Hits        36396    42741    +6345     
   - Misses      15700    20002    +4302     
   - Partials     2677     3062     +385     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `64.95% <59.94%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `15.55% <ø> (ø)` | |
   | [...inot/client/JsonAsyncHttpPinotClientTransport.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0pzb25Bc3luY0h0dHBQaW5vdENsaWVudFRyYW5zcG9ydC5qYXZh) | `10.90% <ø> (-51.10%)` | :arrow_down: |
   | [...n/java/org/apache/pinot/client/ResultSetGroup.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFNldEdyb3VwLmphdmE=) | `65.38% <ø> (+0.16%)` | :arrow_up: |
   | ... and [1190 more](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=footer). Last update [12ee45c...b864bd8](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572463407



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/AzurePinotFSTest.java
##########
@@ -74,8 +72,6 @@ public void testFS()
     Assert.assertTrue(azurePinotFS.exists(new URI(_adlLocation)));
 
     File file = new File(_adlLocation, "testfile2");
-    MockADLFileInputStream adlFileInputStream =

Review comment:
       True. I verified its not used and removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io commented on pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#issuecomment-773109022


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=h1) Report
   > Merging [#6531](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=desc) (b864bd8) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `1.49%`.
   > The diff coverage is `59.94%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6531/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6531      +/-   ##
   ==========================================
   - Coverage   66.44%   64.95%   -1.50%     
   ==========================================
     Files        1075     1334     +259     
     Lines       54773    65805   +11032     
     Branches     8168     9608    +1440     
   ==========================================
   + Hits        36396    42741    +6345     
   - Misses      15700    20002    +4302     
   - Partials     2677     3062     +385     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `64.95% <59.94%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <ø> (+9.52%)` | :arrow_up: |
   | [...n/java/org/apache/pinot/client/BrokerResponse.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Jyb2tlclJlc3BvbnNlLmphdmE=) | `100.00% <ø> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <ø> (-13.29%)` | :arrow_down: |
   | [...n/java/org/apache/pinot/client/ExecutionStats.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0V4ZWN1dGlvblN0YXRzLmphdmE=) | `15.55% <ø> (ø)` | |
   | [...inot/client/JsonAsyncHttpPinotClientTransport.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0pzb25Bc3luY0h0dHBQaW5vdENsaWVudFRyYW5zcG9ydC5qYXZh) | `10.90% <ø> (-51.10%)` | :arrow_down: |
   | [...n/java/org/apache/pinot/client/ResultSetGroup.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L1Jlc3VsdFNldEdyb3VwLmphdmE=) | `65.38% <ø> (+0.16%)` | :arrow_up: |
   | ... and [1190 more](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=footer). Last update [12ee45c...b864bd8](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r574148874



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul I went over the Azure portal. I actually got confused between `storage account` and `container`.  `storage account` looks to be the top-level storage resource concept in Azure while `bucket` is the top level concept in AWS s3. I thought that `container` in Azure is an equivalent concept to `bucket` in AWS. 
   
   So yes, I actually agree with you that we can create the container here in the code. It's equivalent to creating a directory within the root path.
   
   Sorry for the misunderstanding.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul I went over the Azure portal. I actually got confused between `storage account` and `container` concept in Azure.  `storage account` looks to be the top-level storage resource concept in Azure while `bucket` is the top level concept in AWS s3. I thought that `container` in Azure is an equivalent concept to `bucket` in AWS. 
   
   So yes, I actually agree with you that we can create the container here in the code. It's equivalent to creating a directory within the root path.
   
   Sorry for the misunderstanding.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573383802



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul  I'm not sure if this is consistent with gen1 implementation. Your implementation would create the new file system resource for the Azure account and this is not what's happening in Gen1 implementation. Am I missing something?
   
   I am against of Pinot code creating the resources on the Azure account. ADLS Gen2 needs to be pre-created and hooked with the Pinot.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572493996



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/pom.xml
##########
@@ -39,12 +39,46 @@
     <dependency>
       <groupId>com.microsoft.azure</groupId>
       <artifactId>azure-data-lake-store-sdk</artifactId>
-      <version>2.1.5</version>
+      <version>2.3.9</version>
     </dependency>
     <dependency>
       <groupId>com.azure</groupId>
       <artifactId>azure-storage-file-datalake</artifactId>
-      <version>12.0.0-beta.12</version>
+      <version>12.4.0</version>
+    </dependency>
+    <dependency>
+      <groupId>com.azure</groupId>
+      <artifactId>azure-identity</artifactId>
+      <version>1.2.2</version>
     </dependency>
   </dependencies>
+  <dependencyManagement>
+    <dependencies>
+      <dependency>
+        <groupId>org.codehaus.woodstox</groupId>

Review comment:
       I need all of these as. azure-identity pulls in and will lead to convergence error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569882275



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +115,63 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      // Authenticate using accessKey
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);

Review comment:
       Updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569958802



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -96,6 +99,12 @@
   // However, there's some overhead in computing hash. (Adds roughly 3 seconds for 1GB file)
   private boolean _enableChecksum;
 
+  public ADLSGen2PinotFS(DataLakeFileSystemClient fileSystemClient,

Review comment:
       There is no more details to add here, its a constructor and the signature is self documenting. Also looking at the other classes, doesn't seem to be a convention for the code base.
   But I'm guessing you meant for the other methods that are not in this change. So added docs for all of them.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java
##########
@@ -0,0 +1,395 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.filesystem.test;
+
+import com.azure.core.http.rest.PagedIterable;
+import com.azure.core.http.rest.SimpleResponse;
+import com.azure.core.util.Context;
+import com.azure.storage.blob.BlobClient;
+import com.azure.storage.blob.BlobContainerClient;
+import com.azure.storage.blob.BlobServiceClient;
+import com.azure.storage.blob.specialized.BlobInputStream;
+import com.azure.storage.file.datalake.DataLakeDirectoryClient;
+import com.azure.storage.file.datalake.DataLakeFileClient;
+import com.azure.storage.file.datalake.DataLakeFileSystemClient;
+import com.azure.storage.file.datalake.DataLakeServiceClient;
+import com.azure.storage.file.datalake.models.DataLakeStorageException;
+import com.azure.storage.file.datalake.models.PathItem;
+import com.azure.storage.file.datalake.models.PathProperties;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Collections;
+import java.util.HashMap;
+import org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+import org.testng.Assert;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeMethod;
+import org.testng.annotations.Test;
+
+import static org.mockito.Mockito.*;
+
+
+/**
+ * Tests the Azure implementation of ADLSGen2PinotFS
+ */
+public class ADLSGen2PinotFSTest {
+
+  @Mock
+  private DataLakeFileSystemClient _mockFileSystemClient;
+  @Mock
+  private DataLakeDirectoryClient _mockDirectoryClient;
+  @Mock
+  private BlobServiceClient _mockBlobServiceClient;
+  @Mock
+  private BlobContainerClient _mockBlobContainerClient;
+  @Mock
+  private BlobClient _mockBlobClient;
+  @Mock
+  private BlobInputStream _mockBlobInputStream;
+  @Mock
+  private DataLakeServiceClient _mockServiceClient;
+  @Mock
+  private DataLakeFileClient _mockFileClient;
+  @Mock
+  private DataLakeStorageException _mockDataLakeStorageException;
+  @Mock
+  private SimpleResponse _mockSimpleResponse;
+  @Mock
+  private PathProperties _mockPathProperties;
+  @Mock
+  private PagedIterable _mockPagedIterable;
+  @Mock
+  private PathItem _mockPathItem;
+
+  private URI _mockURI;
+  private ADLSGen2PinotFS _adlsGen2PinotFsUnderTest;
+
+  private final static String MOCK_FILE_SYSTEM_NAME = "fileSystemName";
+
+  @BeforeMethod
+  public void setup() throws URISyntaxException {
+    MockitoAnnotations.initMocks(this);
+    _adlsGen2PinotFsUnderTest = new ADLSGen2PinotFS(_mockFileSystemClient, _mockBlobServiceClient);
+    _mockURI = new URI("mock://mock");
+  }
+
+  @AfterMethod
+  public void tearDown() {
+    verifyNoMoreInteractions(_mockDataLakeStorageException, _mockServiceClient, _mockFileSystemClient,
+        _mockSimpleResponse, _mockDirectoryClient, _mockPathItem, _mockPagedIterable, _mockPathProperties,
+        _mockFileClient, _mockBlobContainerClient, _mockBlobClient, _mockBlobServiceClient, _mockBlobInputStream);
+  }
+
+  @Test(expectedExceptions = IllegalArgumentException.class)
+  public void testInitNoAuth() {
+    PinotConfiguration pinotConfiguration = new PinotConfiguration();
+    _adlsGen2PinotFsUnderTest.init(pinotConfiguration);
+  }
+
+  @Test
+  public void testGetOrCreateClientWithFileSystemGet() {
+    when(_mockServiceClient.getFileSystemClient(MOCK_FILE_SYSTEM_NAME)).thenReturn(_mockFileSystemClient);

Review comment:
       Can you elaborate ?
   It is indented just like other classes in the code base.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573430961



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul If Gen1 code automatically creates the top-level blob container, I think that we should change that. I still think that our code should not create the top-level container and we should assume that they are pre-generated. Pinot should be able to generate any files/directories within the container. 
   
   As I mentioned above, each customer would want to configure this container differently so that it makes sense for customers to prepare the ADLS Gen1/Gen2 before hooking to the Pinot. I don't see much benefit of auto-creating the container with some default settings.
   
   The clients will also need to clean this auto-generated container if it gets created unintentionally. It's better to throw the exception that the file system does not exist in the configured Azure account.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569893377



##########
File path: pinot-connectors/pinot-spark-connector/pom.xml
##########
@@ -36,7 +36,7 @@
         <spark.version>2.4.5</spark.version>
         <circe.version>0.13.0</circe.version>
         <paranameter.version>2.8</paranameter.version>
-        <jackson.module.scala.version>2.9.8</jackson.module.scala.version>
+        <jackson.module.scala.version>2.10.0</jackson.module.scala.version>

Review comment:
       Could you add info in the PR description on why all these pom file changes are needed?

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java
##########
@@ -0,0 +1,395 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.filesystem.test;
+
+import com.azure.core.http.rest.PagedIterable;
+import com.azure.core.http.rest.SimpleResponse;
+import com.azure.core.util.Context;
+import com.azure.storage.blob.BlobClient;
+import com.azure.storage.blob.BlobContainerClient;
+import com.azure.storage.blob.BlobServiceClient;
+import com.azure.storage.blob.specialized.BlobInputStream;
+import com.azure.storage.file.datalake.DataLakeDirectoryClient;
+import com.azure.storage.file.datalake.DataLakeFileClient;
+import com.azure.storage.file.datalake.DataLakeFileSystemClient;
+import com.azure.storage.file.datalake.DataLakeServiceClient;
+import com.azure.storage.file.datalake.models.DataLakeStorageException;
+import com.azure.storage.file.datalake.models.PathItem;
+import com.azure.storage.file.datalake.models.PathProperties;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Collections;
+import java.util.HashMap;
+import org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+import org.testng.Assert;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeMethod;
+import org.testng.annotations.Test;
+
+import static org.mockito.Mockito.*;
+
+
+/**
+ * Tests the Azure implementation of ADLSGen2PinotFS
+ */
+public class ADLSGen2PinotFSTest {
+
+  @Mock
+  private DataLakeFileSystemClient _mockFileSystemClient;
+  @Mock
+  private DataLakeDirectoryClient _mockDirectoryClient;
+  @Mock
+  private BlobServiceClient _mockBlobServiceClient;
+  @Mock
+  private BlobContainerClient _mockBlobContainerClient;
+  @Mock
+  private BlobClient _mockBlobClient;
+  @Mock
+  private BlobInputStream _mockBlobInputStream;
+  @Mock
+  private DataLakeServiceClient _mockServiceClient;
+  @Mock
+  private DataLakeFileClient _mockFileClient;
+  @Mock
+  private DataLakeStorageException _mockDataLakeStorageException;
+  @Mock
+  private SimpleResponse _mockSimpleResponse;
+  @Mock
+  private PathProperties _mockPathProperties;
+  @Mock
+  private PagedIterable _mockPagedIterable;
+  @Mock
+  private PathItem _mockPathItem;
+
+  private URI _mockURI;
+  private ADLSGen2PinotFS _adlsGen2PinotFsUnderTest;
+
+  private final static String MOCK_FILE_SYSTEM_NAME = "fileSystemName";
+
+  @BeforeMethod
+  public void setup() throws URISyntaxException {
+    MockitoAnnotations.initMocks(this);
+    _adlsGen2PinotFsUnderTest = new ADLSGen2PinotFS(_mockFileSystemClient, _mockBlobServiceClient);
+    _mockURI = new URI("mock://mock");
+  }
+
+  @AfterMethod
+  public void tearDown() {
+    verifyNoMoreInteractions(_mockDataLakeStorageException, _mockServiceClient, _mockFileSystemClient,
+        _mockSimpleResponse, _mockDirectoryClient, _mockPathItem, _mockPagedIterable, _mockPathProperties,
+        _mockFileClient, _mockBlobContainerClient, _mockBlobClient, _mockBlobServiceClient, _mockBlobInputStream);
+  }
+
+  @Test(expectedExceptions = IllegalArgumentException.class)
+  public void testInitNoAuth() {
+    PinotConfiguration pinotConfiguration = new PinotConfiguration();
+    _adlsGen2PinotFsUnderTest.init(pinotConfiguration);
+  }
+
+  @Test
+  public void testGetOrCreateClientWithFileSystemGet() {
+    when(_mockServiceClient.getFileSystemClient(MOCK_FILE_SYSTEM_NAME)).thenReturn(_mockFileSystemClient);

Review comment:
       Indentation?

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -96,6 +99,12 @@
   // However, there's some overhead in computing hash. (Adds roughly 3 seconds for 1GB file)
   private boolean _enableChecksum;
 
+  public ADLSGen2PinotFS(DataLakeFileSystemClient fileSystemClient,

Review comment:
       Please add javadoc on all public methods/apis.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572463563



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       Its consistent with the gen1 implementation as the blob is an inclusion in gen2. the file paths get created if there are none, and this is following the same. Auto create makes the blob concept transparent to user (like gen1)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572273541



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       I think that we should throw the exception instead of creating the file system in Azure. We can assume that the blob container is created beforehand because this file system may need to be configured differently depending on the requirements of Pinot users. Auto-managing containers will be handy if we need to use multiple containers but in our case, we just need a single container with ADLS Gen2 feature enabled.
   
   Please double check Google Cloud Storage and AWS S3 implementation. We should keep the behavior the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#issuecomment-777232076


   Thanks for taking this up! I'm merging this PR now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli merged pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
jackjlli merged pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572487116



##########
File path: pom.xml
##########
@@ -142,7 +142,7 @@
     <dropwizard-metrics.version>4.1.2</dropwizard-metrics.version>
     <snappy-java.version>1.1.1.7</snappy-java.version>
     <log4j.version>2.11.2</log4j.version>
-    <netty.version>4.1.42.Final</netty.version>
+    <netty.version>4.1.54.Final</netty.version>

Review comment:
       Not sure whether it is possible that we stick to the current netty version and jackson version here, as any of these version bump could make big effect on the whole pinot cluster; it could bring in backward incompatibility between different pinot components during deployment.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,
+      String fileSystemName) {
+    try {
+      DataLakeFileSystemClient fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
+      // The return value is irrelevant. This is to test if the filesystem exists.
+      fileSystemClient.getProperties();
+      return fileSystemClient;
+    } catch (DataLakeStorageException e) {
+      if (e.getStatusCode() == NOT_FOUND_STATUS_CODE && e.getErrorCode().equals(FILE_SYSTEM_NOT_FOUND_ERROR_CODE)) {
+        LOGGER.info("FileSystem with name {} does not exist. Creating one with the same name.", fileSystemName);
+        return serviceClient.createFileSystem(fileSystemName);
+      } else {
+        throw e;

Review comment:
       It'd be good to log one more message here saying we've tried but also failed.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/pom.xml
##########
@@ -39,12 +39,46 @@
     <dependency>
       <groupId>com.microsoft.azure</groupId>
       <artifactId>azure-data-lake-store-sdk</artifactId>
-      <version>2.1.5</version>
+      <version>2.3.9</version>
     </dependency>
     <dependency>
       <groupId>com.azure</groupId>
       <artifactId>azure-storage-file-datalake</artifactId>
-      <version>12.0.0-beta.12</version>
+      <version>12.4.0</version>
+    </dependency>
+    <dependency>
+      <groupId>com.azure</groupId>
+      <artifactId>azure-identity</artifactId>
+      <version>1.2.2</version>
     </dependency>
   </dependencies>
+  <dependencyManagement>
+    <dependencies>
+      <dependency>
+        <groupId>org.codehaus.woodstox</groupId>

Review comment:
       What's the minimal do you think it's needed for your change? Are the following dependencies going to be used in the real code, or they are used in the test? If latter, maybe we can restrict the scope of those dependencies.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573383802



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul  I'm not sure if this is consistent with gen1 implementation. Your implementation would create the new file system resource for the Azure account and this is not what's happening in Gen1 implementation. Am I missing something?
   
   I am against of Pinot code creating the resources on the Azure account. I think that the correct step is that ADLS Gen2 needs to be pre-created and hooked with the Pinot.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569958877



##########
File path: pinot-connectors/pinot-spark-connector/pom.xml
##########
@@ -36,7 +36,7 @@
         <spark.version>2.4.5</spark.version>
         <circe.version>0.13.0</circe.version>
         <paranameter.version>2.8</paranameter.version>
-        <jackson.module.scala.version>2.9.8</jackson.module.scala.version>
+        <jackson.module.scala.version>2.10.0</jackson.module.scala.version>

Review comment:
       updated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572273541



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       I think that we should throw the exception instead of creating the file system in Azure. We can assume that the blob container is created beforehand because this file system may need to be configured differently depending on the requirements of Pinot users. Also, I don't think that Pinot should handle the life cycle of the deep storage layer and it should simply read/write data based on the configuration.
   
   Auto-managing containers will be handy if we need to use multiple containers but in our case, we just need a single container with ADLS Gen2 feature enabled. Also, please double-check with Google Cloud Storage and AWS S3 implementation. We should keep the behavior the same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569868574



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +115,63 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      // Authenticate using accessKey
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);

Review comment:
       It'd be good to print out a message to show which way is currently being used here. Same for the else block.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +115,63 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      // Authenticate using accessKey
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      // Authenticate using AAD
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       Could you add some description on this method. This helps better understand what we're doing here.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker
##########
@@ -0,0 +1,4 @@
+# This class defines Mockito plugins to be used during testing
+
+# mock-maker inline enables mocking of 'final' classes in Mockito v2
+mock-maker-inline

Review comment:
       Missing an empty line at the end of the file

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker
##########
@@ -0,0 +1,4 @@
+# This class defines Mockito plugins to be used during testing
+
+# mock-maker inline enables mocking of 'final' classes in Mockito v2
+mock-maker-inline

Review comment:
       Also why is this file needed?

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java
##########
@@ -0,0 +1,398 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.filesystem.test;
+
+import com.azure.core.http.rest.PagedIterable;
+import com.azure.core.http.rest.SimpleResponse;
+import com.azure.core.util.Context;
+import com.azure.storage.blob.BlobClient;
+import com.azure.storage.blob.BlobContainerClient;
+import com.azure.storage.blob.BlobServiceClient;
+import com.azure.storage.blob.specialized.BlobInputStream;
+import com.azure.storage.file.datalake.DataLakeDirectoryClient;
+import com.azure.storage.file.datalake.DataLakeFileClient;
+import com.azure.storage.file.datalake.DataLakeFileSystemClient;
+import com.azure.storage.file.datalake.DataLakeServiceClient;
+import com.azure.storage.file.datalake.models.DataLakeStorageException;
+import com.azure.storage.file.datalake.models.PathItem;
+import com.azure.storage.file.datalake.models.PathProperties;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.time.Instant;
+import java.time.OffsetDateTime;
+import java.time.ZoneId;
+import java.util.Collections;
+import java.util.HashMap;
+import org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+import org.testng.Assert;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeMethod;
+import org.testng.annotations.Test;
+
+import static org.mockito.Mockito.*;
+
+
+/**
+ * Tests the Azure implementation of ADLSGen2PinotFS
+ */
+public class ADLSGen2PinotFSTest {
+
+  @Mock
+  private DataLakeFileSystemClient _mockFileSystemClient;
+  @Mock
+  private DataLakeDirectoryClient _mockDirectoryClient;
+  @Mock
+  private BlobServiceClient _mockBlobServiceClient;
+  @Mock
+  private BlobContainerClient _mockBlobContainerClient;
+  @Mock
+  private BlobClient _mockBlobClient;
+  @Mock
+  private BlobInputStream _mockBlobInputStream;
+  @Mock
+  private DataLakeServiceClient _mockServiceClient;
+  @Mock
+  private DataLakeFileClient _mockFileClient;
+  @Mock
+  private DataLakeStorageException _mockDataLakeStorageException;
+  @Mock
+  private SimpleResponse _mockSimpleResponse;
+  @Mock
+  private PathProperties _mockPathProperties;
+  @Mock
+  private PagedIterable _mockPagedIterable;
+  @Mock
+  private PathItem _mockPathItem;
+
+  private URI _mockURI;
+  private ADLSGen2PinotFS _underTest;

Review comment:
       Rename it to sth like `_ ADLSGen2PinotFSUnderTest`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573550197



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       There is some misunderstanding here, gen1 doesn't have the concept of blob store. I was equating gen2 blob with creating the directories in gen1. eg how we create pinot_segment/<cluster-name> automatically even though they are not pre created.
   I see what you are saying but don't quite agree with that. Sure, they can create/update it if they have particular need, but we provide a default.
   Don't agree with the "clean unintentionally created" part, that can happen to even directories with clustername or some other auto created path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#issuecomment-773109022


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=h1) Report
   > Merging [#6531](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=desc) (7124378) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `22.56%`.
   > The diff coverage is `41.98%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6531/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #6531       +/-   ##
   ===========================================
   - Coverage   66.44%   43.87%   -22.57%     
   ===========================================
     Files        1075     1339      +264     
     Lines       54773    65865    +11092     
     Branches     8168     9611     +1443     
   ===========================================
   - Hits        36396    28901     -7495     
   - Misses      15700    34534    +18834     
   + Partials     2677     2430      -247     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `43.87% <41.98%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ceselector/StrictReplicaGroupInstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL1N0cmljdFJlcGxpY2FHcm91cEluc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...roker/routing/segmentpruner/TimeSegmentPruner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL1RpbWVTZWdtZW50UHJ1bmVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...roker/routing/segmentpruner/interval/Interval.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL2ludGVydmFsL0ludGVydmFsLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...r/routing/segmentpruner/interval/IntervalTree.java](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9zZWdtZW50cHJ1bmVyL2ludGVydmFsL0ludGVydmFsVHJlZS5qYXZh) | `0.00% <0.00%> (ø)` | |
   | ... and [1347 more](https://codecov.io/gh/apache/incubator-pinot/pull/6531/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=footer). Last update [12ee45c...7124378](https://codecov.io/gh/apache/incubator-pinot/pull/6531?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r574149794



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       yes, that is what I thought.
   No worries, I appreciate the detail in review. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573410217



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       gen1 doesn't have the concept of "filesystem" which is nothing but blob container.
   So in gen1 one would just create a top level directory.
   That is an implementation detail for this resource and is not to be treated as separate from filepath itself.
   **It is similar to create a directory path if it doesn't exist in gen1.** ans should not be treated similar to account creation but file creation.
   
   The reason they implemented it in this way is they had blob store with all the file system semantics already. they just converged them.
   https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction
   
   https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569882429



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/resources/mockito-extensions/org.mockito.plugins.MockMaker
##########
@@ -0,0 +1,4 @@
+# This class defines Mockito plugins to be used during testing
+
+# mock-maker inline enables mocking of 'final' classes in Mockito v2
+mock-maker-inline

Review comment:
       Most of the classes in the azure sdk are 'final', they cant be mocked without this declaration.
   
   I updated the empty line in the code base, github ignores it for some reason.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573430961



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul If Gen1 code automatically creates the top-level blob container, I think that we should change that. I still think that our code should not create the top-level container and we should assume that they are pre-generated. Pinot should be able to generate any files/directories within the container. 
   
   As I mentioned above, each customer would want to configure this container differently so that it makes sense for customers to prepare the ADLS Gen1/Gen2 before hooking to the Pinot. I don't see much benefit of auto-creating the container with some default settings.
   
   The clients will also need to clean this auto-generated container if it gets created unintentionally. It's better to throw the exception that the configured file system does not exist in the configured Azure account.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572494234



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,
+      String fileSystemName) {
+    try {
+      DataLakeFileSystemClient fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
+      // The return value is irrelevant. This is to test if the filesystem exists.
+      fileSystemClient.getProperties();
+      return fileSystemClient;
+    } catch (DataLakeStorageException e) {
+      if (e.getStatusCode() == NOT_FOUND_STATUS_CODE && e.getErrorCode().equals(FILE_SYSTEM_NOT_FOUND_ERROR_CODE)) {
+        LOGGER.info("FileSystem with name {} does not exist. Creating one with the same name.", fileSystemName);
+        return serviceClient.createFileSystem(fileSystemName);
+      } else {
+        throw e;

Review comment:
       the exception gets printed ultimately.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] rkanumul commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
rkanumul commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r569882472



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +115,63 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      // Authenticate using accessKey
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      // Authenticate using AAD
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       updated

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/ADLSGen2PinotFSTest.java
##########
@@ -0,0 +1,398 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.filesystem.test;
+
+import com.azure.core.http.rest.PagedIterable;
+import com.azure.core.http.rest.SimpleResponse;
+import com.azure.core.util.Context;
+import com.azure.storage.blob.BlobClient;
+import com.azure.storage.blob.BlobContainerClient;
+import com.azure.storage.blob.BlobServiceClient;
+import com.azure.storage.blob.specialized.BlobInputStream;
+import com.azure.storage.file.datalake.DataLakeDirectoryClient;
+import com.azure.storage.file.datalake.DataLakeFileClient;
+import com.azure.storage.file.datalake.DataLakeFileSystemClient;
+import com.azure.storage.file.datalake.DataLakeServiceClient;
+import com.azure.storage.file.datalake.models.DataLakeStorageException;
+import com.azure.storage.file.datalake.models.PathItem;
+import com.azure.storage.file.datalake.models.PathProperties;
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.time.Instant;
+import java.time.OffsetDateTime;
+import java.time.ZoneId;
+import java.util.Collections;
+import java.util.HashMap;
+import org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.mockito.Mock;
+import org.mockito.MockitoAnnotations;
+import org.testng.Assert;
+import org.testng.annotations.AfterMethod;
+import org.testng.annotations.BeforeMethod;
+import org.testng.annotations.Test;
+
+import static org.mockito.Mockito.*;
+
+
+/**
+ * Tests the Azure implementation of ADLSGen2PinotFS
+ */
+public class ADLSGen2PinotFSTest {
+
+  @Mock
+  private DataLakeFileSystemClient _mockFileSystemClient;
+  @Mock
+  private DataLakeDirectoryClient _mockDirectoryClient;
+  @Mock
+  private BlobServiceClient _mockBlobServiceClient;
+  @Mock
+  private BlobContainerClient _mockBlobContainerClient;
+  @Mock
+  private BlobClient _mockBlobClient;
+  @Mock
+  private BlobInputStream _mockBlobInputStream;
+  @Mock
+  private DataLakeServiceClient _mockServiceClient;
+  @Mock
+  private DataLakeFileClient _mockFileClient;
+  @Mock
+  private DataLakeStorageException _mockDataLakeStorageException;
+  @Mock
+  private SimpleResponse _mockSimpleResponse;
+  @Mock
+  private PathProperties _mockPathProperties;
+  @Mock
+  private PagedIterable _mockPagedIterable;
+  @Mock
+  private PathItem _mockPathItem;
+
+  private URI _mockURI;
+  private ADLSGen2PinotFS _underTest;

Review comment:
       renamed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573430961



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul If Gen1 code automatically creates the top-level blob container, I think that we should change that. I still think that our code should not create the top-level container and we should assume that they are pre-generated. Pinot should be able to generate any files/directories within the container. 
   
   As I mentioned above, each customer would want to configure this container differently so that it makes sense for customers to prepare the ADLS Gen1/Gen2 before hooking to the Pinot. I don't see much benefit of auto-creating the container with some default settings.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572273541



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       I think that we should throw the exception instead of creating the file system in Azure. We can assume that the blob container is created beforehand because this file system may need to be configured differently depending on the requirements of Pinot users. Auto-managing containers will be handy if we need to use multiple containers but in our case, we just need a single container with ADLS Gen2 feature enabled.
   
   Please double check Google Cloud Storage and AWS S3 implementation. We should keep them in sync.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

Posted by GitBox <gi...@apache.org>.
snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r573430961



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       @rkanumul If Gen1 code automatically creates the top-level blob container, I think that we should change that. I still think that our code should not create the top-level container and we should assume that they are pre-generated. Pinot should be able to generate any files/directories within the container.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org