You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (Jira)" <ji...@apache.org> on 2020/01/09 03:16:00 UTC

[jira] [Commented] (NIFI-3623) PutSQL сan't insert multiple records in postgres if one causes an error

    [ https://issues.apache.org/jira/browse/NIFI-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011360#comment-17011360 ] 

Matt Burgess commented on NIFI-3623:
------------------------------------

I believe the expected (and intended) behavior is to move all in the batch to failure as they are treated as a complete transaction. You may be looking for an UPSERT capability which we don't have yet, many DBs do this via the SQL standard MERGE operator, but that's usually on two existing tables vs a strict UPSERT (the latter is supported on some DBs such as Phoenix). 

We would likely have to add a property for the behavior you describe, since some folks will still want to treat a batch of (or fragmented) flowfiles as a single transaction, because if they expect all-or-nothing, if we do success-and-failure the statements in a batch can end up being executed out of order, be missing expected rows, etc. Also we'd need additional logic to weed out the failed inserts, route them somewhere else, then re-issue the batch we know works. IMO retry is not the right relationship to send flow files we know are always going to fail, so we should still use failure for that.



> PutSQL сan't insert multiple records in postgres if one causes an error
> -----------------------------------------------------------------------
>
>                 Key: NIFI-3623
>                 URL: https://issues.apache.org/jira/browse/NIFI-3623
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.2.0
>            Reporter: Rostislav Koryakin
>            Priority: Major
>
> PutSQL configured as follows:
> Concurrent tasks: 1
> Batch size: 100
> Obtain generated keys: false
> A flow is configured to make a snapshot of particular database table and put into a postgres twice a day.
> Assume there is postgres db table "A" with fields: date, id, value. And primary key: (date, id).
> Sometimes due to network issues or restart of postgres not all entries are inserted.
> Example: "A" contains entries with id 1, 50, and 99.
> PutSQL consumes 100 flowfiles with id from 0 to 99 and tries to insert entries. It starts transaction, but the transaction is rolled back due to constraint violation for keys 1, 50 and 99. Next time the same situation is repeated.
> According to PutSQL implementation: it expects the driver to return a list of succeeded and failed statements. But for postgres - all statement are failed and all flowfiles go to failure instead of retry.
> There several ways to solve it:
> 1) use batch size 1 (looks bad in terms of performance)
> 2) use obtain generated keys = true (there is no need for them)
> 3) Address the issue somehow and move 97 of 100 flowfiles to "retry" to allow process them again.
> Is the expected behaviour in this situation to get 97 files in retry? Or it is normal that all go to "failure" ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)