You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2024/02/08 18:50:00 UTC

[jira] [Commented] (NIFI-12745) AvroReader silently drops record if it's malformed

    [ https://issues.apache.org/jira/browse/NIFI-12745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815790#comment-17815790 ] 

ASF subversion and git services commented on NIFI-12745:
--------------------------------------------------------

Commit 85dc637a9601cf584374f8ea8c07214ac9ece576 in nifi's branch refs/heads/main from Rajmund Takacs
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=85dc637a96 ]

NIFI-12745: Fix AvroReader silently dropping malformed records

This closes #8361.

Signed-off-by: Tamas Palfy <tp...@apache.org>


> AvroReader silently drops record if it's malformed
> --------------------------------------------------
>
>                 Key: NIFI-12745
>                 URL: https://issues.apache.org/jira/browse/NIFI-12745
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 2.0.0-M1, 1.18.0, 1.19.0, 1.20.0, 1.19.1, 1.21.0, 1.22.0, 1.23.0, 1.24.0, 1.23.1, 1.23.2, 1.25.0, 2.0.0-M2
>            Reporter: Rajmund Takacs
>            Assignee: Rajmund Takacs
>            Priority: Major
>         Attachments: ValidateRecord.json
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> See the attached example flow. It reproduces the issue very reliably.
> {{GenerateFlowFile}} is set to generate the following Json:
> {code:json}
> [{
>   "field_1" : 123456789,
>   "field_2" : "44",
>   "field_3" : 5
> }] 
> {code}
> This input is converted to Avro format, using the {{ConvertRecord}} processor. The 'Schema Write Strategy' of {{AvroRecordSetWriter}} is set to anything different than 'Embed Avro Schema'.
> Then, the resulting FF is routed to a processor that uses an {{AvroReader}} to work on the records. The reader is set to use a predefined, fixed schema, which does not match with the input avro file, contains at least an extra field. It does not matter if that field has a default value or not.
> {code:json}
> {
>   "type":"record",
>   "name":"message_name",
>   "namespace":"message_namespace",
>   "fields":[
>     {
>       "name":"field_1",
>       "type":["long"]
>     },
>     {
>       "name":"field_2",
>       "type":["string"]
>     },
>     {
>       "name":"field_3",
>       "type":["int"]
>     },
>     {
>       "name":"extra_field",
>       "type":["string"],
>       "default":"empty"
>     }
>   ]
> }
> {code}
> When this processor consumes the input, the reader silently drops the record, without even making an error log message. At the processor level, this is equivalent to having no records to process, so nothing happens. The user won't notice that there is a misconfiguration somewhere until they start noticing the missing the flow files.
> The expected behavior from the processors would be to route the malformed input FF to their failure relationship, and shout an error on its bulletin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)