You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2022/12/16 20:48:27 UTC

[GitHub] [nifi] nandorsoma opened a new pull request, #6792: NIFI-10884 Conflict resolution in PutAzureDataLakeStorage should log the target filename

nandorsoma opened a new pull request, #6792:
URL: https://github.com/apache/nifi/pull/6792

   <!-- Licensed to the Apache Software Foundation (ASF) under one or more -->
   <!-- contributor license agreements.  See the NOTICE file distributed with -->
   <!-- this work for additional information regarding copyright ownership. -->
   <!-- The ASF licenses this file to You under the Apache License, Version 2.0 -->
   <!-- (the "License"); you may not use this file except in compliance with -->
   <!-- the License.  You may obtain a copy of the License at -->
   <!--     http://www.apache.org/licenses/LICENSE-2.0 -->
   <!-- Unless required by applicable law or agreed to in writing, software -->
   <!-- distributed under the License is distributed on an "AS IS" BASIS, -->
   <!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -->
   <!-- See the License for the specific language governing permissions and -->
   <!-- limitations under the License. -->
   
   # Summary
   
   [NIFI-10884](https://issues.apache.org/jira/browse/NIFI-10884)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request creation.
   
   ### Build
   
   - [x] Build completed using `mvn clean install -P contrib-check`
     - [x] JDK 8
     - [ ] JDK 11
     - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] nandorsoma commented on a diff in pull request #6792: NIFI-10884 Conflict resolution in PutAzureDataLakeStorage should log the target filename

Posted by GitBox <gi...@apache.org>.
nandorsoma commented on code in PR #6792:
URL: https://github.com/apache/nifi/pull/6792#discussion_r1051498998


##########
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java:
##########
@@ -212,25 +212,38 @@ static void uploadContent(DataLakeFileClient fileClient, InputStream in, long le
         fileClient.flush(length, true);
     }
 
-    //Visible for testing
+    /**
+     * This method serves as a "commit" for the upload process. To support various Conflict Resolution Strategies the processor uploads
+     * the content of the FlowFile to a temporary file with a unique name, then attempts to rename it. It is not an efficient approach,
+     * especially for large files, but it is needed because of the issue (azure-sdk-for-java/issues/31248) linked above.

Review Comment:
   Hey @turcsanyip!
   Thanks for the review. It seems like I misinterpreted one of the comments in the linked issue. Fixed the doc!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] turcsanyip commented on a diff in pull request #6792: NIFI-10884 Conflict resolution in PutAzureDataLakeStorage should log the target filename

Posted by GitBox <gi...@apache.org>.
turcsanyip commented on code in PR #6792:
URL: https://github.com/apache/nifi/pull/6792#discussion_r1051492496


##########
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java:
##########
@@ -212,25 +212,38 @@ static void uploadContent(DataLakeFileClient fileClient, InputStream in, long le
         fileClient.flush(length, true);
     }
 
-    //Visible for testing
+    /**
+     * This method serves as a "commit" for the upload process. To support various Conflict Resolution Strategies the processor uploads
+     * the content of the FlowFile to a temporary file with a unique name, then attempts to rename it. It is not an efficient approach,
+     * especially for large files, but it is needed because of the issue (azure-sdk-for-java/issues/31248) linked above.

Review Comment:
   The temporary file + rename was needed because the "put" in ADLS is not atomic. You create the file first (0-byte file), then append the payload. So the work-in-progress file would be available for readers before the full upload is finished.
   So it is not strictly related to conflict resolution and has nothing to do with issue 31248 (which is about chunked uploading of large files).
   
   Could you please correct the documentation accordingly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] nandorsoma commented on a diff in pull request #6792: NIFI-10884 Conflict resolution in PutAzureDataLakeStorage should log the target filename

Posted by GitBox <gi...@apache.org>.
nandorsoma commented on code in PR #6792:
URL: https://github.com/apache/nifi/pull/6792#discussion_r1051498998


##########
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/PutAzureDataLakeStorage.java:
##########
@@ -212,25 +212,38 @@ static void uploadContent(DataLakeFileClient fileClient, InputStream in, long le
         fileClient.flush(length, true);
     }
 
-    //Visible for testing
+    /**
+     * This method serves as a "commit" for the upload process. To support various Conflict Resolution Strategies the processor uploads
+     * the content of the FlowFile to a temporary file with a unique name, then attempts to rename it. It is not an efficient approach,
+     * especially for large files, but it is needed because of the issue (azure-sdk-for-java/issues/31248) linked above.

Review Comment:
   Hey @turcsanyip!
   Thanks for the review. It seems like I misinterpreted one of the comments in the linked issue. I've fixed the doc!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] asfgit closed pull request #6792: NIFI-10884 Conflict resolution in PutAzureDataLakeStorage should log the target filename

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #6792: NIFI-10884 Conflict resolution in PutAzureDataLakeStorage should log the target filename
URL: https://github.com/apache/nifi/pull/6792


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org