You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/09/01 15:30:01 UTC

[jira] [Commented] (NIFI-7740) Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming

    [ https://issues.apache.org/jira/browse/NIFI-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188559#comment-17188559 ] 

ASF subversion and git services commented on NIFI-7740:
-------------------------------------------------------

Commit 45470b0984ab83750155e9c7a540c79bfe862817 in nifi's branch refs/heads/main from Matt Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=45470b0 ]

NIFI-7740: Add Records Per Transaction and Transactions Per Batch properties to PutHive3Streaming

NIFI-7740: Incorporated review comments

NIFI-7740: Restore RecordsEOFException superclass to SerializationError

This closes #4489.

Signed-off-by: Peter Turcsanyi <tu...@apache.org>


> Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-7740
>                 URL: https://issues.apache.org/jira/browse/NIFI-7740
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The original PutHiveStreaming (for Hive 1.2.x) exposed properties to the user for tuning the number of records in an individual Hive Streaming transaction, as well as the number of transactions to be batched together (for performance).
> These properties should be exposed in the PutHive3Streaming processor in order to tune its performance. The default values should result in the current behavior, so a setting of zero for Records Per Transaction will put all records into a single transaction, and a setting of 1 for Transactions Per Batch will result in a single transaction in each batch. Together these defaults describe the current behavior.
> For large files, Records Per Transaction should be set to something more manageable, such as 100K perhaps, and Transactions Per Batch to something such as 10. As a rule the product of the two numbers should be larger than the largest expected number of records in the flow file(s), this will ensure any failed transaction batches cause a full rollback. The documentation for these properties should include this prescription.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)