You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2021/11/27 16:09:58 UTC

[GitHub] [nifi] VibAruna opened a new pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

VibAruna opened a new pull request #5554:
URL: https://github.com/apache/nifi/pull/5554


   
   
   <!--
     Licensed to the Apache Software Foundation (ASF) under one or more
     contributor license agreements.  See the NOTICE file distributed with
     this work for additional information regarding copyright ownership.
     The ASF licenses this file to You under the Apache License, Version 2.0
     (the "License"); you may not use this file except in compliance with
     the License.  You may obtain a copy of the License at
         http://www.apache.org/licenses/LICENSE-2.0
     Unless required by applicable law or agreed to in writing, software
     distributed under the License is distributed on an "AS IS" BASIS,
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     See the License for the specific language governing permissions and
     limitations under the License.
   -->
   
   
   #### Description of PR
   
   This fixes the bug of NIFI consuming large heap space when it queries large data sets (millions or billions of records) through postgresql JDBC driver. 
   
   According to the PostgreSQL JDBC driver document; https://jdbc.postgresql.org//documentation/head/query.html, driver collects all the results for the query at once by default. This behavior leads the ExecuteSQL and ExeccuteSQLRecord processors for a large heap usage. Caching a small amount of records (defined by fetch size) in client side of the connation and retrieving the next data block when exhausted is possible only if auto committing is set to false. 
   
   ![image](https://user-images.githubusercontent.com/32428342/143688260-8ba6defa-ea15-4c62-a163-f15024009ad0.png)
   
   This PR adds a new property for above mentioned two processors to enable/disable auto-committing in the connection.
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [x] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically `main`)?
   
   - [x] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] Have you verified that the full build is successful on JDK 8?
   - [ ] Have you verified that the full build is successful on JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] mattyb149 closed pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
mattyb149 closed pull request #5554:
URL: https://github.com/apache/nifi/pull/5554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] mattyb149 commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
mattyb149 commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r772424012



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -236,195 +246,197 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
         }
 
         int resultCount = 0;
-        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes());
-             final PreparedStatement st = con.prepareStatement(selectQuery)) {
-            if (fetchSize != null && fetchSize > 0) {
-                try {
-                    st.setFetchSize(fetchSize);
-                } catch (SQLException se) {
-                    // Not all drivers support this, just log the error (at debug level) and move on
-                    logger.debug("Cannot set fetch size to {} due to {}", new Object[]{fetchSize, se.getLocalizedMessage()}, se);
+        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes())) {
+            con.setAutoCommit(context.getProperty(AUTO_COMMIT).asBoolean());

Review comment:
       If autocommit is set to false, then commit() must be called explicitly on the connection and I don't see that here. However some drivers throw an exception if commit() is called on an autocommit connection, so I recommend here that we save off the current value of autocommit, then call setAutoCommit(), then at the end of processing check autocommit and call commit() if it is false. Then we can restore the original value of autocommit as good housekeeping.
   
   Also for the PostgreSQL case, what is expected of the code if any error occurs during processing? Usually for non-autocommit connections we'd call rollback() rather than commit(). Perhaps we do something similar to the above and check for autocommit in a `catch()` clause for the try that creates the connection, and if false call rollback().

##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -169,6 +169,16 @@
             .expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
             .build();
 
+    public static final PropertyDescriptor AUTO_COMMIT = new PropertyDescriptor.Builder()
+            .name("auto commit")
+            .displayName("Set Auto Commit")
+            .description("Enables or disables the auto commit functionality of the DB connection. For some JDBC drivers, it is required to disable the auto committing "

Review comment:
       I'd like to see more documentation around this property to inform the user of the behavior they can expect. For your PostgreSQL example, you could use the doc you already have here, but in general you could describe it using the information here: https://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html or whatever. 

##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -169,6 +169,16 @@
             .expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
             .build();
 
+    public static final PropertyDescriptor AUTO_COMMIT = new PropertyDescriptor.Builder()
+            .name("auto commit")

Review comment:
       For consistency can we name this `esql-auto-commit`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] VibAruna commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
VibAruna commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r775249310



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -169,6 +169,16 @@
             .expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
             .build();
 
+    public static final PropertyDescriptor AUTO_COMMIT = new PropertyDescriptor.Builder()
+            .name("auto commit")
+            .displayName("Set Auto Commit")
+            .description("Enables or disables the auto commit functionality of the DB connection. For some JDBC drivers, it is required to disable the auto committing "

Review comment:
       Updated the description. Please review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] VibAruna commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
VibAruna commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r772518916



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -236,195 +246,197 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
         }
 
         int resultCount = 0;
-        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes());
-             final PreparedStatement st = con.prepareStatement(selectQuery)) {
-            if (fetchSize != null && fetchSize > 0) {
-                try {
-                    st.setFetchSize(fetchSize);
-                } catch (SQLException se) {
-                    // Not all drivers support this, just log the error (at debug level) and move on
-                    logger.debug("Cannot set fetch size to {} due to {}", new Object[]{fetchSize, se.getLocalizedMessage()}, se);
+        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes())) {
+            con.setAutoCommit(context.getProperty(AUTO_COMMIT).asBoolean());

Review comment:
       Hi @mattyb149,
   
   The change I have done is only affects only to '**ExecuteSqlRecord**' and '**ExecuteSql**' processors which are used to execute '**Select sqls**'. This new feature to set or unset auto-commit is required to avoid these processors from consuming large heap space when they query large data sets (There may be some other drivers with this same behavior). Since no data writing happens from these processors, I believe we don't need to call commit() or rollback().




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] VibAruna commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
VibAruna commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r772518916



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -236,195 +246,197 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
         }
 
         int resultCount = 0;
-        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes());
-             final PreparedStatement st = con.prepareStatement(selectQuery)) {
-            if (fetchSize != null && fetchSize > 0) {
-                try {
-                    st.setFetchSize(fetchSize);
-                } catch (SQLException se) {
-                    // Not all drivers support this, just log the error (at debug level) and move on
-                    logger.debug("Cannot set fetch size to {} due to {}", new Object[]{fetchSize, se.getLocalizedMessage()}, se);
+        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes())) {
+            con.setAutoCommit(context.getProperty(AUTO_COMMIT).asBoolean());

Review comment:
       Hi @mattyb149,
   
   The change I have done only affects to '**ExecuteSqlRecord**' and '**ExecuteSql**' processors which are used to execute '**Select sqls**'. This new feature to set or unset auto-commit is required to avoid these processors from consuming large heap space when they query large data sets (There may be some other drivers with this same behavior). Since no data writing happens from these processors, I believe we don't need to call commit() or rollback().




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] VibAruna commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
VibAruna commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r772518916



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -236,195 +246,197 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
         }
 
         int resultCount = 0;
-        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes());
-             final PreparedStatement st = con.prepareStatement(selectQuery)) {
-            if (fetchSize != null && fetchSize > 0) {
-                try {
-                    st.setFetchSize(fetchSize);
-                } catch (SQLException se) {
-                    // Not all drivers support this, just log the error (at debug level) and move on
-                    logger.debug("Cannot set fetch size to {} due to {}", new Object[]{fetchSize, se.getLocalizedMessage()}, se);
+        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes())) {
+            con.setAutoCommit(context.getProperty(AUTO_COMMIT).asBoolean());

Review comment:
       Hi @mattyb149,
   
   The change I have done only affects only to '**ExecuteSqlRecord**' and '**ExecuteSql**' processors which are used to execute '**Select sqls**'. This new feature to set or unset auto-commit is required to avoid these processors from consuming large heap space when they query large data sets (There may be some other drivers with this same behavior). Since no data writing happens from these processors, I believe we don't need to call commit() or rollback().




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] mattyb149 commented on pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
mattyb149 commented on pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#issuecomment-1003213215


   +1 LGTM, tested with PostgreSQL, MySQL, and Oracle, verified expected behavior. Thanks for the improvement! Merging to main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] mattyb149 commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
mattyb149 commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r773310541



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -236,195 +246,197 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
         }
 
         int resultCount = 0;
-        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes());
-             final PreparedStatement st = con.prepareStatement(selectQuery)) {
-            if (fetchSize != null && fetchSize > 0) {
-                try {
-                    st.setFetchSize(fetchSize);
-                } catch (SQLException se) {
-                    // Not all drivers support this, just log the error (at debug level) and move on
-                    logger.debug("Cannot set fetch size to {} due to {}", new Object[]{fetchSize, se.getLocalizedMessage()}, se);
+        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes())) {
+            con.setAutoCommit(context.getProperty(AUTO_COMMIT).asBoolean());

Review comment:
       Perhaps we don't **need** to call those, but we shouldn't need to set autocommit to do a SELECT anyway, PostgreSQL does that for its own transactional system. Having said that, I don't like the idea of leaving the transaction "open" at the end of processing. Does calling commit() and/or rollback() cause an issue with PostgreSQL and/or Derby (the latter for unit testing)? I would hope not, and if it isn't an issue I'd like to see the call to commit() or rollback(), just for consistency. I'll test with MySQL and Oracle to make sure there's nothing weird there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] VibAruna commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
VibAruna commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r775249223



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -236,195 +246,197 @@ public void onTrigger(final ProcessContext context, final ProcessSession session
         }
 
         int resultCount = 0;
-        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes());
-             final PreparedStatement st = con.prepareStatement(selectQuery)) {
-            if (fetchSize != null && fetchSize > 0) {
-                try {
-                    st.setFetchSize(fetchSize);
-                } catch (SQLException se) {
-                    // Not all drivers support this, just log the error (at debug level) and move on
-                    logger.debug("Cannot set fetch size to {} due to {}", new Object[]{fetchSize, se.getLocalizedMessage()}, se);
+        try (final Connection con = dbcpService.getConnection(fileToProcess == null ? Collections.emptyMap() : fileToProcess.getAttributes())) {
+            con.setAutoCommit(context.getProperty(AUTO_COMMIT).asBoolean());

Review comment:
       Hi @mattyb149,
   
   I have updated the pull request with calling commit and added unit tests. Could you please review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] VibAruna commented on a change in pull request #5554: NIFI-8605 Adding a new property to enable/disable auto committing

Posted by GitBox <gi...@apache.org>.
VibAruna commented on a change in pull request #5554:
URL: https://github.com/apache/nifi/pull/5554#discussion_r775249442



##########
File path: nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractExecuteSQL.java
##########
@@ -169,6 +169,16 @@
             .expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
             .build();
 
+    public static final PropertyDescriptor AUTO_COMMIT = new PropertyDescriptor.Builder()
+            .name("auto commit")

Review comment:
       Updated the name. Please review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org