You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/05/14 16:22:26 UTC

[GitHub] [nifi] pgyori opened a new pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

pgyori opened a new pull request #4273:
URL: https://github.com/apache/nifi/pull/4273


   https://issues.apache.org/jira/browse/NIFI-7446
   
   #### Description of PR
   
   FetchAzureDataLakeStorage processor now throws exception when the specified path points to a directory.
   A newer version (12.1.1) of azure-storage-file-datalake is imported.
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [ ] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically `master`)?
   
   - [ ] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] Have you verified that the full build is successful on both JDK 8 and JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] turcsanyip commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r425910853



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       I can see a possible optimization here.
   According to @MuazmaZ's comment (https://github.com/apache/nifi/pull/4257#discussion_r423930566), `get***Client()` does not involve a network call but `getProperties()` does.
   Most of the cases, this call would not be needed, because when the returned entity has content, it is definitely not a directory but a file. So I think this check could be moved after `session.write()` (line 74) and it needs to be checked only when the resulted flowfile is empty.
   @pgyori, @MuazmaZ: what is your opinion?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] MuazmaZ commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
MuazmaZ commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r426854519



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       @turcsanyip re-testing it right now. I will update shortly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] pgyori commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
pgyori commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r425817343



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {
+                throw new ProcessException(FILE.getDisplayName() + " (" + fileName + ") points to a directory. Full path: " + fileClient.getFilePath());

Review comment:
       It looks like switching to version 12.1.1 of azure-storage-file-datalake instead of 12.0.1 did not take effect in your build. Can you please verify that you rebuilt the entire nifi-azure-bundle and used the newly generated nifi-azure-nar in your running instance of NiFi (after restart)?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] MuazmaZ commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
MuazmaZ commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r425486369



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {
+                throw new ProcessException(FILE.getDisplayName() + " (" + fileName + ") points to a directory. Full path: " + fileClient.getFilePath());

Review comment:
       The Exception I am getting is FetchAzureDataLakeStorage[id=xxxxx] failed to process session due to com.azure.storage.file.datalake.models.PathProperties.isDirectory()Ljava/lang/Boolean;; Processor Administratively Yielded for 1 sec: java.lang.NoSuchMethodError: com.azure.storage.file.datalake.models.PathProperties.isDirectory()Ljava/lang/Boolean;




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] asfgit closed pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4273:
URL: https://github.com/apache/nifi/pull/4273


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] pgyori commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
pgyori commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r426788175



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       Unfortunately things get quite complicated in that case. If we call isDirectory() after session.write(), then the flowfile is already overwritten with the empty content, which means that after throwing the ProcessException, the flowfile that goes to the error output has no content (instead of the content of the original input flowfile). To avoid losing the content of the input flowfile, we would need to copy and store its content (before calling session.write()), and if isDirectory() returns true, we would need to load back this content to the original flowfile before throwing the exception. This would result in higher memory consumption in case of large input flowfiles.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] turcsanyip commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r426910774



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       Thanks, merging to master.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] turcsanyip commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r425843940



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {
+                throw new ProcessException(FILE.getDisplayName() + " (" + fileName + ") points to a directory. Full path: " + fileClient.getFilePath());

Review comment:
       It works properly for me too. I also updated the nifi-azure-services-api-nar (copy to NiFi's lib folder). Though, it might not necessary, only the nifi-azure-nar.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] MuazmaZ commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
MuazmaZ commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r426801050



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       I agree @pgyori about the flow and for large flows that would be higher memory consumption. Also, a valid flowfile with empty content could be a real scenario where sometimes the process generates empty files to trigger other flows that I have seen with customers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] MuazmaZ commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
MuazmaZ commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r426909105



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       Looks good to me after rebuild. +1
   
   Failure to fetch file from Azure Data Lake Storage: org.apache.nifi.processor.exception.ProcessException: File Name (xyz) points to a directory. Full path: xyz/xyz
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] turcsanyip commented on a change in pull request #4273: NIFI-7446: Fail when the specified path is a directory in FetchAzureDataLakeStorage

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on a change in pull request #4273:
URL: https://github.com/apache/nifi/pull/4273#discussion_r426847987



##########
File path: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/FetchAzureDataLakeStorage.java
##########
@@ -67,6 +67,10 @@ public void onTrigger(ProcessContext context, ProcessSession session) throws Pro
             final DataLakeDirectoryClient directoryClient = dataLakeFileSystemClient.getDirectoryClient(directory);
             final DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);
 
+            if (fileClient.getProperties().isDirectory()) {

Review comment:
       @pgyori, @MuazmaZ Thanks for the clarification and the feedback. Then it is fine as it is now.
   LGTM from my side.
   @MuazmaZ Did you manage to make it work on your side or still NoSuchMethodError? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org