You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/02/08 18:38:24 UTC

[GitHub] [incubator-pinot] snleee commented on a change in pull request #6531: Update ADLSGen2PinotFS auth; Introduce unit tests

snleee commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572267855



##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/test/java/org/apache/pinot/plugin/filesystem/test/AzurePinotFSTest.java
##########
@@ -74,8 +72,6 @@ public void testFS()
     Assert.assertTrue(azurePinotFS.exists(new URI(_adlLocation)));
 
     File file = new File(_adlLocation, "testfile2");
-    MockADLFileInputStream adlFileInputStream =

Review comment:
       Can you check the usage of `MockADLFileInputStream` ? If there's no usage, we can remove this class.

##########
File path: pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
     // TODO: consider to add the encryption of the following config
     String accessKey = config.getProperty(ACCESS_KEY);
     String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+    String clientId = config.getProperty(CLIENT_ID);
+    String clientSecret = config.getProperty(CLIENT_SECRET);
+    String tenantId = config.getProperty(TENANT_ID);
 
     String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_STORAGE_DNS_SUFFIX;
     String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName + AZURE_BLOB_DNS_SUFFIX;
 
-    StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+    DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+    BlobServiceClientBuilder blobServiceClientBuilder = new BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+    if (accountName!= null && accessKey != null) {
+      LOGGER.info("Authenticating using the access key to the account.");
+      StorageSharedKeyCredential sharedKeyCredential = new StorageSharedKeyCredential(accountName, accessKey);
+      dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+      blobServiceClientBuilder.credential(sharedKeyCredential);
+    } else if (clientId != null && clientSecret != null && tenantId != null) {
+      LOGGER.info("Authenticating using Azure Active Directory");
+      ClientSecretCredential clientSecretCredential = new ClientSecretCredentialBuilder()
+          .clientId(clientId)
+          .clientSecret(clientSecret)
+          .tenantId(tenantId)
+          .build();
+      dataLakeServiceClientBuilder.credential(clientSecretCredential);
+      blobServiceClientBuilder.credential(clientSecretCredential);
+    } else {
+      // Error out as at least one mode of auth info needed
+      throw new IllegalArgumentException("Expecting either (accountName, accessKey) or (clientId, clientSecret, tenantId)");
+    }
 
-    DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder().credential(sharedKeyCredential)
-        .endpoint(dfsServiceEndpointUrl)
-        .buildClient();
+    _blobServiceClient = blobServiceClientBuilder.buildClient();
+    DataLakeServiceClient serviceClient = dataLakeServiceClientBuilder.buildClient();
+    _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient, fileSystemName);
 
-    _blobServiceClient =
-        new BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
-    _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
     LOGGER.info("ADLSGen2PinotFS is initialized (accountName={}, fileSystemName={}, dfsServiceEndpointUrl={}, "
             + "blobServiceEndpointUrl={}, enableChecksum={})", accountName, fileSystemName, dfsServiceEndpointUrl,
         blobServiceEndpointUrl, _enableChecksum);
   }
 
+  /**
+   * Returns the DataLakeFileSystemClient to the specified file system creating if it doesn't exist.
+   *
+   * @param serviceClient authenticated data lake service client to an account
+   * @param fileSystemName name of the file system (blob container)
+   * @return DataLakeFileSystemClient with the specified fileSystemName.
+   */
+  @VisibleForTesting
+  public DataLakeFileSystemClient getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,

Review comment:
       I think that we should throw the exception instead of creating the file system in Azure. We can assume that the blob container is created beforehand because this file system may need to be configured differently. Auto-managing containers will be handy if we need to use multiple containers but in our case, we just need a single container with ADLS Gen2 feature enabled.
   
   Please double check Google Cloud Storage and AWS S3 implementation. We should keep them in sync.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org