You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Nandor Soma Abonyi (Jira)" <ji...@apache.org> on 2023/06/22 10:21:00 UTC
[jira] [Commented] (NIFI-11449) add autocommit property to PutDatabaseRecord processor

    [ https://issues.apache.org/jira/browse/NIFI-11449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736060#comment-17736060 ] 

Nandor Soma Abonyi commented on NIFI-11449:
-------------------------------------------

Hi [~Abdelrahimk]!
We have a dedicated processor called PutIceberg ([https://github.com/apache/nifi/blob/adb8420b484a971103d4d5e5017cab228c5c56de/nifi-nar-bundles/nifi-iceberg-bundle/nifi-iceberg-processors/src/main/java/org/apache/nifi/processors/iceberg/PutIceberg.java#L79]). Any chance that you've tried it already?

> add autocommit property to PutDatabaseRecord processor
> ------------------------------------------------------
>
>                 Key: NIFI-11449
>                 URL: https://issues.apache.org/jira/browse/NIFI-11449
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.21.0
>         Environment: Any Nifi Deployment
>            Reporter: Abdelrahim Ahmad
>            Priority: Blocker
>              Labels: Trino, autocommit, database, iceberg, putdatabaserecord
>
> The issue is with the {{PutDatabaseRecord}} processor in Apache NiFi. When using the processor with the Trino-JDBC-Driver or Dremio-JDBC-Driver to write to an Iceberg catalog, it disables the autocommit feature. This leads to errors such as "{*}Catalog only supports writes using autocommit: iceberg{*}".
> the autocommit feature needs to be added in the processor to be enabled/disabled.
> enabling auto-commit in the Nifi PutDatabaseRecord processor is important for Deltalake, Iceberg, and Hudi as it ensures data consistency and integrity by allowing atomic writes to be performed in the underlying database. This will allow the process to be widely used with bigger range of databases.
> _Improving this processor will allow Nifi to be the main tool to ingest data into these new Technologies. So we don't have to deal with another tool to do so._
> +*_{color:#de350b}BUT:{color}_*+
> I have reviewed The {{PutDatabaseRecord}} processor in NiFi. It inserts records one by one into the database using a prepared statement, and commits the transaction at the end of the loop that processes each record. This approach can be inefficient and slow when inserting large volumes of data into tables that are optimized for bulk ingestion, such as Delta Lake, Iceberg, and Hudi tables.
> These tables use various techniques to optimize the performance of bulk ingestion, such as partitioning, clustering, and indexing. Inserting records one by one using a prepared statement can bypass these optimizations, leading to poor performance and potentially causing issues such as excessive disk usage, increased memory consumption, and decreased query performance.
> To avoid these issues, it is recommended to have a new processor, or add feature to the current one, to bulk insert method with AutoCommit feature when inserting large volumes of data into Delta Lake, Iceberg, and Hudi tables. 
>  
> P.S.: using PutSQL is not a have autoCommit but have the same performance problem described above..
> Thanks and best regards :)
> Abdelrahim Ahmad



--
This message was sent by Atlassian Jira
(v8.20.10#820010)