You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "MormonJesus69420 (via GitHub)" <gi...@apache.org> on 2023/04/20 10:52:38 UTC

[GitHub] [nifi] MormonJesus69420 opened a new pull request, #7184: NIFI-11472 Make PutFTP processor more multithread friendly

MormonJesus69420 opened a new pull request, #7184:
URL: https://github.com/apache/nifi/pull/7184

   Add an extra check during directory creation to see if directory wasn't already created in another thread.
   
   From Issue:
   
   Problem happens when a PutFTP is set to run several concurrent tasks and two (or more ) FlowFiles come in and both need to create the same directory. One of them will create directory and succeed immediately while the other will try to create directory, but fail since it already exist, throw an error, the FlowFile will then be penalized and on second run will succeed.
   
   While it is not the biggest error, as files are getting transferred in the end, but the bulletins and errors are annoying, especially in production environment where you don't want to get unnecessary errors.
   
   We found that the solution involves a simple change to the FTPTransfer.java class in:
   nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/FTPTransfer.java
   On line 398 and ensureDirectoryExists method you can simply add another if check which double checks that the directory exists when it fails to create one.
   ```java
   final boolean cdSuccessful = setWorkingDirectory(remoteDirectory);
   
   if (!cdSuccessful) {    
     if (client.makeDirectory(remoteDirectory)) {        
       logger.debug("Remote Directory not found: created directory [{}]", remoteDirectory);    
     } else if (setWorkingDirectory(remoteDirectory)) { 
            // Double check that the dir exists as it might have been created in another thread        
       throw new IOException("Failed to create remote directory " + remoteDirectory);    
     }
   }
   ```
   
   <!-- Licensed to the Apache Software Foundation (ASF) under one or more -->
   <!-- contributor license agreements.  See the NOTICE file distributed with -->
   <!-- this work for additional information regarding copyright ownership. -->
   <!-- The ASF licenses this file to You under the Apache License, Version 2.0 -->
   <!-- (the "License"); you may not use this file except in compliance with -->
   <!-- the License.  You may obtain a copy of the License at -->
   <!--     http://www.apache.org/licenses/LICENSE-2.0 -->
   <!-- Unless required by applicable law or agreed to in writing, software -->
   <!-- distributed under the License is distributed on an "AS IS" BASIS, -->
   <!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -->
   <!-- See the License for the specific language governing permissions and -->
   <!-- limitations under the License. -->
   
   # Summary
   
   [NIFI-11472](https://issues.apache.org/jira/browse/NIFI-11472)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request creation.
   
   ### Build
   
   - [x] Build completed using `mvn clean install -P contrib-check`
     - [x] JDK 11
     - [ ] JDK 17
   
   ### Licensing
   
   - [x] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html)
   - [x] New dependencies are documented in applicable `LICENSE` and `NOTICE` files
   
   ### Documentation
   
   - [x] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] MormonJesus69420 commented on pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly

Posted by "MormonJesus69420 (via GitHub)" <gi...@apache.org>.
MormonJesus69420 commented on PR #7184:
URL: https://github.com/apache/nifi/pull/7184#issuecomment-1517320690

   Strange, I don't understand why it failed on the Windows action. I don't have the ability to test it on Windows either, since we use Linux for development at work. Also my branch is based off of the latest nifi/main branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] arpadboda commented on pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly

Posted by "arpadboda (via GitHub)" <gi...@apache.org>.
arpadboda commented on PR #7184:
URL: https://github.com/apache/nifi/pull/7184#issuecomment-1516193929

   Hello @MormonJesus69420 , thanks for your contribution!
   I'm not sure if using multiple threads to transfer to the same ftp makes any sense, this operation is limited by bandwith anyway. 
   So I would prefer to restrict it to single threaded usage, but the current implementation allows uploading to multiple hosts based on flowfile attributes, in which case multiple threads might make sense, so I'm not against this change.
   Weak +1, I'm ok to merge with a 2nd approval. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] MormonJesus69420 commented on pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly

Posted by "MormonJesus69420 (via GitHub)" <gi...@apache.org>.
MormonJesus69420 commented on PR #7184:
URL: https://github.com/apache/nifi/pull/7184#issuecomment-1516261028

   I am sorry, I managed to make the most basic mistake in such a small change, I forgot to add a `!` to the `setWorkingDirectory(remoteDirectory)` method call.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] MormonJesus69420 commented on pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly

Posted by "MormonJesus69420 (via GitHub)" <gi...@apache.org>.
MormonJesus69420 commented on PR #7184:
URL: https://github.com/apache/nifi/pull/7184#issuecomment-1521368398

   Hi, I see that the PR is already closed, but thought I might add the example where we tested running PutFTP processor using 1 and 2 concurrent tasks. As long as we were able to provide the data to send, you can see that upload speed was almost doubled. As such the bottleneck was how fast we were able to provide data to send, rather than sending of data. 
   ![Comparison in MB/s transfer speeds between 1 and 2 concurrent tasks in PutFTP](https://user-images.githubusercontent.com/10923336/234216962-a7d10cf6-f60a-438c-aa66-6c0bdf4335ba.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] MormonJesus69420 commented on pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly

Posted by "MormonJesus69420 (via GitHub)" <gi...@apache.org>.
MormonJesus69420 commented on PR #7184:
URL: https://github.com/apache/nifi/pull/7184#issuecomment-1516223097

   Hi @arpadboda I might not have described it so well, but the issue we face is when we configure processor to run several concurrent tasks.
   
   ![Concurrent tasks setting in PutFTP processor configuration](https://user-images.githubusercontent.com/10923336/233361444-7aa16eb8-4595-44e7-a4c6-c475cb42ed22.png)
   When we change the number to two or more we start receiving "errors" about processor being unable to create directory as it already exists. While the issue resolves itself, it is rather distracting to see it, when it's not a "real" error.
   
   We have noticed a significant performance boost when using a PutFTP processor with more than one concurrent task. I don't have the numbers on me at the moment, but I think that switching to two or three concurrent tasks significantly sped up the transfer time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nifi] exceptionfactory closed pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly

Posted by "exceptionfactory (via GitHub)" <gi...@apache.org>.
exceptionfactory closed pull request #7184: NIFI-11472 Make PutFTP processor more multithread friendly
URL: https://github.com/apache/nifi/pull/7184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org