You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Nathan Gough (Jira)" <ji...@apache.org> on 2022/11/09 04:14:00 UTC

[jira] [Assigned] (NIFI-10685) PutDatabaseRecord Upsert does not quote columns in conflict clause (postgres)

     [ https://issues.apache.org/jira/browse/NIFI-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nathan Gough reassigned NIFI-10685:
-----------------------------------

    Assignee: Nathan Gough

> PutDatabaseRecord Upsert does not quote columns in conflict clause (postgres)
> -----------------------------------------------------------------------------
>
>                 Key: NIFI-10685
>                 URL: https://issues.apache.org/jira/browse/NIFI-10685
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.18.0
>            Reporter: mgerbig
>            Assignee: Nathan Gough
>            Priority: Major
>         Attachments: putdatabase-test.avro
>
>
> Nifi's PutDatabaseRecord does not quote columns in the conflict clause when using PostgreSQLDatabaseAdapter. This prevents the usage of upsert (and possibly insert ignore) statements as soon as at least one column of the primary key contains at least one uppercase character.
> h2. Root Cause
> Postgres lowercases all unquoted keys, as described in the manual, which causes this bug:
> {quote}Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case. For example, the identifiers {{{}FOO{}}}, {{{}foo{}}}, and {{"foo"}} are considered the same by PostgreSQL, but {{"Foo"}} and {{"FOO"}} are different from these three and each other. (The folding of unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that unquoted names should be folded to upper case. Thus, {{foo}} should be equivalent to {{"FOO"}} not {{"foo"}} according to the standard. If you want to write portable applications you are advised to always quote a particular name or never quote it.)
> {quote}
> see [https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS]
> h2. Reproduce behaviour
> Create a table in your *PostgreSQL* database
> {code:java}
> CREATE TABLE test_table
> (
>     "TIME_PERIOD" date NOT NULL,
>     "R0102" numeric,
>     "R0203" numeric,
>     "R0304" numeric,
>     "R0405" numeric,
>     "R0506" numeric,
>     "R0607" numeric,
>     "R0708" numeric,
>     "R0809" numeric,
>     "R0910" numeric,
>     modified_at_utc date DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP(0)),
>     CONSTRAINT test_table PRIMARY KEY ("TIME_PERIOD")
> ) {code}
> Use attached avro to write to the table using a PutDatabaseRecord Processor configured with UPSERT.
> Nifi will throw following Exception in the Bulletin Board
> {code:java}
> PutDatabaseRecord[id=...] Failed to put Records to database for FlowFile[filename=redacted.avro]. Routing to failure.: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") VALUES ('2022-10-21 +00', '2.97'::numeric, '3.2'::numeric, '3.19'::numeric, '3.19'::numeric, '3.22'::numeric, '3.33'::numeric, '3.33'::numeric, '3.4'::numeric, '3.38'::numeric) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") = (EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203", EXCLUDED."R0304", EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607", EXCLUDED."R0708", EXCLUDED."R0809", EXCLUDED."R0910") was aborted: ERROR: column "time_period" does not exist
>   Position: 347  Call getNextException to see other errors in the batch.
> - Caused by: org.postgresql.util.PSQLException: ERROR: column "time_period" does not exist
>   Position: 347 {code}
> h3. Generated SQL Statement
> Nifi generates and issues following SQL Statement to the database:
> {code:java}
>  INSERT INTO "public"."test_table"("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ON CONFLICT (TIME_PERIOD) DO UPDATE SET ("TIME_PERIOD", "R0102", "R0203", "R0304", "R0405", "R0506", "R0607", "R0708", "R0809", "R0910") = (EXCLUDED."TIME_PERIOD", EXCLUDED."R0102", EXCLUDED."R0203", EXCLUDED."R0304", EXCLUDED."R0405", EXCLUDED."R0506", EXCLUDED."R0607", EXCLUDED."R0708", EXCLUDED."R0809", EXCLUDED."R0910") {code}
> This statement only works with quoted _TIME_PERIOD_ in the _ON CONFLICT_ clause due to the behaviour of postgres described in Section _Root Cause_
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)