You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Oliveira, Emanuel" <Em...@fmr.com> on 2019/12/05 12:06:48 UTC

NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Hi all,

I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
--objective - Mandatory to have "Records array" with at least "eventVersion"
- using ValidateRecord > Allow Extra Fields
- problem im facing is nifi dont trigger payload BAD 1 as invalid!!

How can I make mandatory the Records array ? Is it possible ?

I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?



--OK - payload GOOD
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersion": "aaa"
      }
   ]
}

--NOK - payload BAD 1 - missing "Records" array --> BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "RecordsXXX": [{
         "eventVersion": "aaa"
      }
   ]
}

--OK - payload BAD 2 - "Records" array present but missing "eventVersion"
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersionXX": "aaa"
      }
   ]
}

Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
[cid:image001.png@01D5AB64.7723C590]


Heres ValidateRecord processor + reader/writer controllers:

  *   Avro schema with just array “Records” and “eventVersion” as min tag on array element.
  *   Using Allow Extra Fields true:
     *   So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
     *   FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
==========================================================

--ValidateRecord 1.8.0
Record Reader                           JsonTreeReader
Record Writer                           JsonRecordSetWriter
Record Writer for Invalid Records
Schema Access Strategy                  Use Reader's Schema
Schema Registry                         No value set
Schema Name                             ${schema.name}
Schema Text                             ${avro.schema}
Allow Extra Fields                      true
Strict Type Checking                    true

--JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
Schema Access Strategy                  Use 'Schema Text' Property
Schema Registry
Schema Name                             ${schema.name}
Schema Version
Schema Branch
Schema Text
                                        {
                                           "name": "MyName",
                                           "type": "record",
                                           "namespace": "aa.bb.cc",
                                           "fields": [{
                                                 "name": "Records",
                                                 "type": {
                                                    "type": "array",
                                                    "items": {
                                                       "name": "Records_record",
                                                       "type": "record",
                                                       "fields": [{
                                                             "name": "eventVersion",
                                                             "type": "string"
                                                          }
                                                       ]
                                                    }
                                                 }
                                              }
                                           ]
                                        }
Date Format
Time Format
Timestamp Format

--JsonRecordSetWriter 1.8.0
Schema Write Strategy                   Do Not Write Schema
Schema Access Strategy                  Inherit Record Schema
Schema Registry
Schema Name                             ${schema.name}
Schema Version
Schema Branch
Schema Text                             { "name": "eventVersion", "type": "string" }
Date Format
Time Format
Timestamp Format
Pretty Print JSON                       true
Suppress Null Values                    Never Suppress
Output Grouping                         Array

Thanks in advance,
Emanuel Oliveira


Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Emanuel Oliveira <em...@gmail.com>.
Thanks Pierre!

On Mon 6 Jan 2020, 17:06 Pierre Villard, <pi...@gmail.com>
wrote:

> Hi Emanuel,
>
> The PR is currently under review so that would not be included in NiFi
> 1.10.0 (which is already released). We recently discussed about releasing a
> new NiFi version (1.10.1 or 1.11.0) and if the PR is merged before such a
> release, it would certainly be included in that version.
>
> Hope it makes sense,
> Pierre
>
>
> Le lun. 6 janv. 2020 à 22:08, Oliveira, Emanuel <Em...@fmr.com>
> a écrit :
>
>> Thanks Matt and Mark!
>> We still on version
>> 1.8.0
>> 10/22/2018 23:48:30 EDT
>> Tagged nifi-1.8.0-RC3
>>
>> Current version is 1.10
>>
>> As curiosity, when could we expected this fix to be available ? Would it
>> mean we upgrade to 1.10 ? Thanks.
>>
>> Thanks//Regards,
>> Emanuel Oliveira
>>
>>
>>
>> -----Original Message-----
>> From: Matt Burgess <ma...@apache.org>
>> Sent: Friday 20 December 2019 17:52
>> To: users@nifi.apache.org
>> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>> This email is from an external source - exercise caution regarding links
>> and attachments.
>>
>>
>> Mark is spot-on with the diagnosis, a default empty array is being
>> created for the missing field even if no default value is specified in the
>> schema. All it needs is an extra null check in order to return null as the
>> default value, then the record is marked invalid as expected.
>>
>> I have written up NIFI-6963 [1] to cover this, and issued a PR to fix it
>> [2]. Mark, would you kindly do the honors of a review? Please and thanks!
>>
>> -Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-6963
>> [2] https://github.com/apache/nifi/pull/3948
>>
>> On Wed, Dec 11, 2019 at 10:25 AM Mark Payne <ma...@hotmail.com> wrote:
>> >
>> > Emanuel,
>> >
>> > I looked into this a week or so ago, but haven't had a chance to
>> resolve the issue yet. It does appear to be a bug. Specifically, I believe
>> the bug is here [1].  When we create a RecordSchema from the Avro Schema,
>> we set the default value for the array to an empty array, instead of null.
>> Because of this, when the JSON is parsed, we end up creating a Record with
>> an empty array for the "Record" field instead of a null. As as result, the
>> Record is considered valid because it does have an array (it's just empty).
>> I think it *should* be a null value instead.
>> >
>> > It looks like this was introduced in NIFI-4893 [2]. We can easily
>> change it to just return a null value for the default, but that does result
>> in two of the unit tests added in NIFI-4893 failing. It may be that those
>> unit tests need to be fixed, or it may be that such a change does break
>> something. I just haven't had a chance yet to dig that far into it.
>> >
>> > If you're someone who is comfortable digging into the code and making
>> the updates, then please do and I'm happy to review a PR as soon as I'm
>> able.
>> >
>> > Thanks
>> > -Mark
>> >
>> >
>> > [1]
>> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-exten
>> > sion-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/
>> > apache/nifi/avro/AvroTypeUtil.java#L629-L631
>> >
>> > [2] https://issues.apache.org/jira/browse/NIFI-4893
>> >
>> >
>> >
>> > On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <
>> Emanuel.Oliveira@fmr.com> wrote:
>> >
>> > Anyway knowledgably on avro schemas can please confirm/suggest if this
>> inability to invalidate json payload missing array in root when allowing
>> extra field-true is normal ?
>> >
>> > There’s 2 options with:
>> >
>> > ValidateRecord.Allow Extra Fields=false à need to supply full schema
>> > ValidateRecord.Allow Extra Fields=true à this is what I been
>> testing/want, a way to supply schema with only mandatory fields.
>> >
>> >
>> > I want 2 mandatory fields, an array with at least 1 element having
>> eventVersion, so minimal json should be:
>> > { (..)
>> >    "Records": [{
>> >          "eventVersion": "aaa"
>> >          (..)
>> >       }
>> >    ]
>> >    (..)
>> > }
>> >
>> > Problem is ValidateRecord considers FF valid if missing “Records” array
>> in the root!!!!
>> > {
>> >    "Service": "sssssss",
>> >    "Event": "eeeee",
>> >    "Time": "2019-11-25T16:21:53.280Z",
>> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
>> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>> > }
>> >
>> > IF I supply the array “Records” then the schema correctly validates I
>> need at least eventVersion on the array element record.
>> >
>> >
>> > So… maybe my question can be tuned to “is it possible on avro schema
>> syntax to specify cardinalities like in a db e/r diagram where a relation
>> can be one of the following:
>> > 0..n
>> > 1..0
>> > 1 and only 1 ?
>> >
>> >
>> > Thanks//Regards,
>> > Emanuel Oliveira
>> > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74
>> > 4971 | int: 8-737 4971 |  who's who
>> >
>> > From: Oliveira, Emanuel <Em...@fmr.com>
>> > Sent: Friday 6 December 2019 10:15
>> > To: users@nifi.apache.org
>> > Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>> >
>> > Hi Mark, forgot to share the NiFi version we using:
>> > 1.8.0
>> > 10/22/2018 23:48:30 EDT
>> > Tagged nifi-1.8.0-RC3
>> >
>> >
>> > Thanks//Regards,
>> > Emanuel Oliveira
>> > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74
>> > 4971 | int: 8-737 4971 |  who's who
>> >
>> > From: Emanuel Oliveira <em...@gmail.com>
>> > Sent: Thursday 5 December 2019 22:42
>> > To: users@nifi.apache.org
>> > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>> >
>> > This email is from an external source - exercise caution regarding
>> links and attachments.
>> >
>> > Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
>> GenerateFlowfile as this is the problem.
>> >
>> > Cheers,
>> > Emanuel
>> >
>> > On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
>> >
>> > Emanuel,
>> >
>> > What version of NiFi are you using?
>> >
>> > I just tested the attached template against the latest, and the
>> FlowFile was routed to 'invalid' with the explanation:
>> >
>> > Records in this FlowFile were invalid for the following reasons: The
>> > following 1 fields were missing: [[0]/Records/eventVersion]
>> >
>> >
>> >
>> >
>> > Thanks
>> > -Mark
>> >
>> >
>> >
>> >
>> > On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>
>> wrote:
>> >
>> > Hi all,
>> >
>> > I been struggling to find a way for ValidateRecord using Avro Schema to
>> force mandatory the presence of an array on json payload, problem is if
>> array “records” is missing Validate is considering FF valid ☹.
>> > --objective - Mandatory to have "Records array" with at least
>> "eventVersion"
>> > - using ValidateRecord > Allow Extra Fields
>> > - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>> >
>> > How can I make mandatory the Records array ? Is it possible ?
>> >
>> > I know I can eventually use a SplitJson JsonPath Expression=$.Records
>> to rid off the ARRAY, and also to fial if array "Records" not present.. But
>> I would like to have a clean solution using just avro schema, is this
>> possible ?
>> >
>> >
>> >
>> > --OK - payload GOOD
>> > {
>> >    "Service": "sssssss",
>> >    "Event": "eeeee",
>> >    "Time": "2019-11-25T16:21:53.280Z",
>> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
>> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>> >    "Records": [{
>> >          "eventVersion": "aaa"
>> >       }
>> >    ]
>> > }
>> >
>> > --NOK - payload BAD 1 - missing "Records" array à BUT
>> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
>> “invalid” since is not compliant to my avro schema which needs array
>> “Records” with element “eventVersion” as 2 mandatory things.
>> > {
>> >    "Service": "sssssss",
>> >    "Event": "eeeee",
>> >    "Time": "2019-11-25T16:21:53.280Z",
>> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
>> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>> >    "RecordsXXX": [{
>> >          "eventVersion": "aaa"
>> >       }
>> >    ]
>> > }
>> >
>> > --OK - payload BAD 2 - "Records" array present but missing
>> "eventVersion"
>> > {
>> >    "Service": "sssssss",
>> >    "Event": "eeeee",
>> >    "Time": "2019-11-25T16:21:53.280Z",
>> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
>> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>> >    "Records": [{
>> >          "eventVersionXX": "aaa"
>> >       }
>> >    ]
>> > }
>> >
>> > Its very simple test flow (attachmed the xml template
>> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
>> ValidateRecord with JsonReader/Json Writer:
>> > <image001.png>
>> >
>> >
>> > Heres ValidateRecord processor + reader/writer controllers:
>> >
>> > Avro schema with just array “Records” and “eventVersion” as min tag on
>> array element.
>> > Using Allow Extra Fields true:
>> >
>> > So im ok having other fields on the root side by side with the array
>> “Records”, and also ok to have extra elements inside each array.
>> > FYI: the real use case im trying to validate AWS SQS message (s3
>> trigger) where I will be interested on several fields, but crafted this
>> simpler example just to ask if its possible to force array to be mandatory
>> and with at least 1 element ?
>> >
>> > ==========================================================
>> >
>> > --ValidateRecord 1.8.0
>> > Record Reader                           JsonTreeReader
>> > Record Writer                           JsonRecordSetWriter
>> > Record Writer for Invalid Records
>> > Schema Access Strategy                  Use Reader's Schema
>> > Schema Registry                         No value set
>> > Schema Name                             ${schema.name}
>> > Schema Text                             ${avro.schema}
>> > Allow Extra Fields                      true
>> > Strict Type Checking                    true
>> >
>> > --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
>> "eventVersion" on each ARRAY element
>> > Schema Access Strategy                  Use 'Schema Text' Property
>> > Schema Registry
>> > Schema Name                             ${schema.name}
>> > Schema Version
>> > Schema Branch
>> > Schema Text
>> >                                         {
>> >                                            "name": "MyName",
>> >                                            "type": "record",
>> >                                            "namespace": "aa.bb.cc",
>> >                                            "fields": [{
>> >                                                  "name": "Records",
>> >                                                  "type": {
>> >                                                     "type": "array",
>> >                                                     "items": {
>> >                                                        "name":
>> "Records_record",
>> >                                                        "type": "record",
>> >                                                        "fields": [{
>> >                                                              "name":
>> "eventVersion",
>> >                                                              "type":
>> "string"
>> >                                                           }
>> >                                                        ]
>> >                                                     }
>> >                                                  }
>> >                                               }
>> >                                            ]
>> >                                         } Date Format Time Format
>> > Timestamp Format
>> >
>> > --JsonRecordSetWriter 1.8.0
>> > Schema Write Strategy                   Do Not Write Schema
>> > Schema Access Strategy                  Inherit Record Schema
>> > Schema Registry
>> > Schema Name                             ${schema.name}
>> > Schema Version
>> > Schema Branch
>> > Schema Text                             { "name": "eventVersion",
>> "type": "string" }
>> > Date Format
>> > Time Format
>> > Timestamp Format
>> > Pretty Print JSON                       true
>> > Suppress Null Values                    Never Suppress
>> > Output Grouping                         Array
>> >
>> > Thanks in advance,
>> > Emanuel Oliveira
>> >
>> > <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>> >
>> >
>>
>

Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Pierre Villard <pi...@gmail.com>.
Hi Emanuel,

The PR is currently under review so that would not be included in NiFi
1.10.0 (which is already released). We recently discussed about releasing a
new NiFi version (1.10.1 or 1.11.0) and if the PR is merged before such a
release, it would certainly be included in that version.

Hope it makes sense,
Pierre


Le lun. 6 janv. 2020 à 22:08, Oliveira, Emanuel <Em...@fmr.com>
a écrit :

> Thanks Matt and Mark!
> We still on version
> 1.8.0
> 10/22/2018 23:48:30 EDT
> Tagged nifi-1.8.0-RC3
>
> Current version is 1.10
>
> As curiosity, when could we expected this fix to be available ? Would it
> mean we upgrade to 1.10 ? Thanks.
>
> Thanks//Regards,
> Emanuel Oliveira
>
>
>
> -----Original Message-----
> From: Matt Burgess <ma...@apache.org>
> Sent: Friday 20 December 2019 17:52
> To: users@nifi.apache.org
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
> This email is from an external source - exercise caution regarding links
> and attachments.
>
>
> Mark is spot-on with the diagnosis, a default empty array is being created
> for the missing field even if no default value is specified in the schema.
> All it needs is an extra null check in order to return null as the default
> value, then the record is marked invalid as expected.
>
> I have written up NIFI-6963 [1] to cover this, and issued a PR to fix it
> [2]. Mark, would you kindly do the honors of a review? Please and thanks!
>
> -Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-6963
> [2] https://github.com/apache/nifi/pull/3948
>
> On Wed, Dec 11, 2019 at 10:25 AM Mark Payne <ma...@hotmail.com> wrote:
> >
> > Emanuel,
> >
> > I looked into this a week or so ago, but haven't had a chance to resolve
> the issue yet. It does appear to be a bug. Specifically, I believe the bug
> is here [1].  When we create a RecordSchema from the Avro Schema, we set
> the default value for the array to an empty array, instead of null. Because
> of this, when the JSON is parsed, we end up creating a Record with an empty
> array for the "Record" field instead of a null. As as result, the Record is
> considered valid because it does have an array (it's just empty). I think
> it *should* be a null value instead.
> >
> > It looks like this was introduced in NIFI-4893 [2]. We can easily change
> it to just return a null value for the default, but that does result in two
> of the unit tests added in NIFI-4893 failing. It may be that those unit
> tests need to be fixed, or it may be that such a change does break
> something. I just haven't had a chance yet to dig that far into it.
> >
> > If you're someone who is comfortable digging into the code and making
> the updates, then please do and I'm happy to review a PR as soon as I'm
> able.
> >
> > Thanks
> > -Mark
> >
> >
> > [1]
> > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-exten
> > sion-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/
> > apache/nifi/avro/AvroTypeUtil.java#L629-L631
> >
> > [2] https://issues.apache.org/jira/browse/NIFI-4893
> >
> >
> >
> > On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
> >
> > Anyway knowledgably on avro schemas can please confirm/suggest if this
> inability to invalidate json payload missing array in root when allowing
> extra field-true is normal ?
> >
> > There’s 2 options with:
> >
> > ValidateRecord.Allow Extra Fields=false à need to supply full schema
> > ValidateRecord.Allow Extra Fields=true à this is what I been
> testing/want, a way to supply schema with only mandatory fields.
> >
> >
> > I want 2 mandatory fields, an array with at least 1 element having
> eventVersion, so minimal json should be:
> > { (..)
> >    "Records": [{
> >          "eventVersion": "aaa"
> >          (..)
> >       }
> >    ]
> >    (..)
> > }
> >
> > Problem is ValidateRecord considers FF valid if missing “Records” array
> in the root!!!!
> > {
> >    "Service": "sssssss",
> >    "Event": "eeeee",
> >    "Time": "2019-11-25T16:21:53.280Z",
> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> > }
> >
> > IF I supply the array “Records” then the schema correctly validates I
> need at least eventVersion on the array element record.
> >
> >
> > So… maybe my question can be tuned to “is it possible on avro schema
> syntax to specify cardinalities like in a db e/r diagram where a relation
> can be one of the following:
> > 0..n
> > 1..0
> > 1 and only 1 ?
> >
> >
> > Thanks//Regards,
> > Emanuel Oliveira
> > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74
> > 4971 | int: 8-737 4971 |  who's who
> >
> > From: Oliveira, Emanuel <Em...@fmr.com>
> > Sent: Friday 6 December 2019 10:15
> > To: users@nifi.apache.org
> > Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
> >
> > Hi Mark, forgot to share the NiFi version we using:
> > 1.8.0
> > 10/22/2018 23:48:30 EDT
> > Tagged nifi-1.8.0-RC3
> >
> >
> > Thanks//Regards,
> > Emanuel Oliveira
> > Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74
> > 4971 | int: 8-737 4971 |  who's who
> >
> > From: Emanuel Oliveira <em...@gmail.com>
> > Sent: Thursday 5 December 2019 22:42
> > To: users@nifi.apache.org
> > Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
> >
> > This email is from an external source - exercise caution regarding links
> and attachments.
> >
> > Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
> GenerateFlowfile as this is the problem.
> >
> > Cheers,
> > Emanuel
> >
> > On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
> >
> > Emanuel,
> >
> > What version of NiFi are you using?
> >
> > I just tested the attached template against the latest, and the FlowFile
> was routed to 'invalid' with the explanation:
> >
> > Records in this FlowFile were invalid for the following reasons: The
> > following 1 fields were missing: [[0]/Records/eventVersion]
> >
> >
> >
> >
> > Thanks
> > -Mark
> >
> >
> >
> >
> > On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
> >
> > Hi all,
> >
> > I been struggling to find a way for ValidateRecord using Avro Schema to
> force mandatory the presence of an array on json payload, problem is if
> array “records” is missing Validate is considering FF valid ☹.
> > --objective - Mandatory to have "Records array" with at least
> "eventVersion"
> > - using ValidateRecord > Allow Extra Fields
> > - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
> >
> > How can I make mandatory the Records array ? Is it possible ?
> >
> > I know I can eventually use a SplitJson JsonPath Expression=$.Records to
> rid off the ARRAY, and also to fial if array "Records" not present.. But I
> would like to have a clean solution using just avro schema, is this
> possible ?
> >
> >
> >
> > --OK - payload GOOD
> > {
> >    "Service": "sssssss",
> >    "Event": "eeeee",
> >    "Time": "2019-11-25T16:21:53.280Z",
> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> >    "Records": [{
> >          "eventVersion": "aaa"
> >       }
> >    ]
> > }
> >
> > --NOK - payload BAD 1 - missing "Records" array à BUT
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
> “invalid” since is not compliant to my avro schema which needs array
> “Records” with element “eventVersion” as 2 mandatory things.
> > {
> >    "Service": "sssssss",
> >    "Event": "eeeee",
> >    "Time": "2019-11-25T16:21:53.280Z",
> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> >    "RecordsXXX": [{
> >          "eventVersion": "aaa"
> >       }
> >    ]
> > }
> >
> > --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> > {
> >    "Service": "sssssss",
> >    "Event": "eeeee",
> >    "Time": "2019-11-25T16:21:53.280Z",
> >    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> >    "RequestId": "RRRRRRRRRRRRRRRRRR",
> >    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> >    "Records": [{
> >          "eventVersionXX": "aaa"
> >       }
> >    ]
> > }
> >
> > Its very simple test flow (attachmed the xml template
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
> ValidateRecord with JsonReader/Json Writer:
> > <image001.png>
> >
> >
> > Heres ValidateRecord processor + reader/writer controllers:
> >
> > Avro schema with just array “Records” and “eventVersion” as min tag on
> array element.
> > Using Allow Extra Fields true:
> >
> > So im ok having other fields on the root side by side with the array
> “Records”, and also ok to have extra elements inside each array.
> > FYI: the real use case im trying to validate AWS SQS message (s3
> trigger) where I will be interested on several fields, but crafted this
> simpler example just to ask if its possible to force array to be mandatory
> and with at least 1 element ?
> >
> > ==========================================================
> >
> > --ValidateRecord 1.8.0
> > Record Reader                           JsonTreeReader
> > Record Writer                           JsonRecordSetWriter
> > Record Writer for Invalid Records
> > Schema Access Strategy                  Use Reader's Schema
> > Schema Registry                         No value set
> > Schema Name                             ${schema.name}
> > Schema Text                             ${avro.schema}
> > Allow Extra Fields                      true
> > Strict Type Checking                    true
> >
> > --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
> "eventVersion" on each ARRAY element
> > Schema Access Strategy                  Use 'Schema Text' Property
> > Schema Registry
> > Schema Name                             ${schema.name}
> > Schema Version
> > Schema Branch
> > Schema Text
> >                                         {
> >                                            "name": "MyName",
> >                                            "type": "record",
> >                                            "namespace": "aa.bb.cc",
> >                                            "fields": [{
> >                                                  "name": "Records",
> >                                                  "type": {
> >                                                     "type": "array",
> >                                                     "items": {
> >                                                        "name":
> "Records_record",
> >                                                        "type": "record",
> >                                                        "fields": [{
> >                                                              "name":
> "eventVersion",
> >                                                              "type":
> "string"
> >                                                           }
> >                                                        ]
> >                                                     }
> >                                                  }
> >                                               }
> >                                            ]
> >                                         } Date Format Time Format
> > Timestamp Format
> >
> > --JsonRecordSetWriter 1.8.0
> > Schema Write Strategy                   Do Not Write Schema
> > Schema Access Strategy                  Inherit Record Schema
> > Schema Registry
> > Schema Name                             ${schema.name}
> > Schema Version
> > Schema Branch
> > Schema Text                             { "name": "eventVersion",
> "type": "string" }
> > Date Format
> > Time Format
> > Timestamp Format
> > Pretty Print JSON                       true
> > Suppress Null Values                    Never Suppress
> > Output Grouping                         Array
> >
> > Thanks in advance,
> > Emanuel Oliveira
> >
> > <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
> >
> >
>

RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by "Oliveira, Emanuel" <Em...@fmr.com>.
Thanks Matt and Mark!
We still on version 
1.8.0
10/22/2018 23:48:30 EDT
Tagged nifi-1.8.0-RC3

Current version is 1.10

As curiosity, when could we expected this fix to be available ? Would it mean we upgrade to 1.10 ? Thanks.

Thanks//Regards,
Emanuel Oliveira



-----Original Message-----
From: Matt Burgess <ma...@apache.org> 
Sent: Friday 20 December 2019 17:52
To: users@nifi.apache.org
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.


Mark is spot-on with the diagnosis, a default empty array is being created for the missing field even if no default value is specified in the schema. All it needs is an extra null check in order to return null as the default value, then the record is marked invalid as expected.

I have written up NIFI-6963 [1] to cover this, and issued a PR to fix it [2]. Mark, would you kindly do the honors of a review? Please and thanks!

-Matt

[1] https://issues.apache.org/jira/browse/NIFI-6963
[2] https://github.com/apache/nifi/pull/3948

On Wed, Dec 11, 2019 at 10:25 AM Mark Payne <ma...@hotmail.com> wrote:
>
> Emanuel,
>
> I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1].  When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead.
>
> It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it.
>
> If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able.
>
> Thanks
> -Mark
>
>
> [1] 
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-exten
> sion-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/
> apache/nifi/avro/AvroTypeUtil.java#L629-L631
>
> [2] https://issues.apache.org/jira/browse/NIFI-4893
>
>
>
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com> wrote:
>
> Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?
>
> There’s 2 options with:
>
> ValidateRecord.Allow Extra Fields=false à need to supply full schema 
> ValidateRecord.Allow Extra Fields=true à this is what I been testing/want, a way to supply schema with only mandatory fields.
>
>
> I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
> { (..)
>    "Records": [{
>          "eventVersion": "aaa"
>          (..)
>       }
>    ]
>    (..)
> }
>
> Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> }
>
> IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.
>
>
> So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
> 0..n
> 1..0
> 1 and only 1 ?
>
>
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74  
> 4971 | int: 8-737 4971 |  who's who
>
> From: Oliveira, Emanuel <Em...@fmr.com>
> Sent: Friday 6 December 2019 10:15
> To: users@nifi.apache.org
> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>
> Hi Mark, forgot to share the NiFi version we using:
> 1.8.0
> 10/22/2018 23:48:30 EDT
> Tagged nifi-1.8.0-RC3
>
>
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway TEL ext: 353 – (0)91-74  
> 4971 | int: 8-737 4971 |  who's who
>
> From: Emanuel Oliveira <em...@gmail.com>
> Sent: Thursday 5 December 2019 22:42
> To: users@nifi.apache.org
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>
> This email is from an external source - exercise caution regarding links and attachments.
>
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.
>
> Cheers,
> Emanuel
>
> On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
>
> Emanuel,
>
> What version of NiFi are you using?
>
> I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:
>
> Records in this FlowFile were invalid for the following reasons: The 
> following 1 fields were missing: [[0]/Records/eventVersion]
>
>
>
>
> Thanks
> -Mark
>
>
>
>
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com> wrote:
>
> Hi all,
>
> I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> - using ValidateRecord > Allow Extra Fields
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>
> How can I make mandatory the Records array ? Is it possible ?
>
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?
>
>
>
> --OK - payload GOOD
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>
> --NOK - payload BAD 1 - missing "Records" array à BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "RecordsXXX": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersionXX": "aaa"
>       }
>    ]
> }
>
> Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
> <image001.png>
>
>
> Heres ValidateRecord processor + reader/writer controllers:
>
> Avro schema with just array “Records” and “eventVersion” as min tag on array element.
> Using Allow Extra Fields true:
>
> So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
>
> ==========================================================
>
> --ValidateRecord 1.8.0
> Record Reader                           JsonTreeReader
> Record Writer                           JsonRecordSetWriter
> Record Writer for Invalid Records
> Schema Access Strategy                  Use Reader's Schema
> Schema Registry                         No value set
> Schema Name                             ${schema.name}
> Schema Text                             ${avro.schema}
> Allow Extra Fields                      true
> Strict Type Checking                    true
>
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
> Schema Access Strategy                  Use 'Schema Text' Property
> Schema Registry
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text
>                                         {
>                                            "name": "MyName",
>                                            "type": "record",
>                                            "namespace": "aa.bb.cc",
>                                            "fields": [{
>                                                  "name": "Records",
>                                                  "type": {
>                                                     "type": "array",
>                                                     "items": {
>                                                        "name": "Records_record",
>                                                        "type": "record",
>                                                        "fields": [{
>                                                              "name": "eventVersion",
>                                                              "type": "string"
>                                                           }
>                                                        ]
>                                                     }
>                                                  }
>                                               }
>                                            ]
>                                         } Date Format Time Format 
> Timestamp Format
>
> --JsonRecordSetWriter 1.8.0
> Schema Write Strategy                   Do Not Write Schema
> Schema Access Strategy                  Inherit Record Schema
> Schema Registry
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text                             { "name": "eventVersion", "type": "string" }
> Date Format
> Time Format
> Timestamp Format
> Pretty Print JSON                       true
> Suppress Null Values                    Never Suppress
> Output Grouping                         Array
>
> Thanks in advance,
> Emanuel Oliveira
>
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>
>

Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Matt Burgess <ma...@apache.org>.
Mark is spot-on with the diagnosis, a default empty array is being
created for the missing field even if no default value is specified in
the schema. All it needs is an extra null check in order to return
null as the default value, then the record is marked invalid as
expected.

I have written up NIFI-6963 [1] to cover this, and issued a PR to fix
it [2]. Mark, would you kindly do the honors of a review? Please and
thanks!

-Matt

[1] https://issues.apache.org/jira/browse/NIFI-6963
[2] https://github.com/apache/nifi/pull/3948

On Wed, Dec 11, 2019 at 10:25 AM Mark Payne <ma...@hotmail.com> wrote:
>
> Emanuel,
>
> I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1].  When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead.
>
> It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it.
>
> If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able.
>
> Thanks
> -Mark
>
>
> [1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>
> [2] https://issues.apache.org/jira/browse/NIFI-4893
>
>
>
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com> wrote:
>
> Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?
>
> There’s 2 options with:
>
> ValidateRecord.Allow Extra Fields=false à need to supply full schema
> ValidateRecord.Allow Extra Fields=true à this is what I been testing/want, a way to supply schema with only mandatory fields.
>
>
> I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
> { (..)
>    "Records": [{
>          "eventVersion": "aaa"
>          (..)
>       }
>    ]
>    (..)
> }
>
> Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> }
>
> IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.
>
>
> So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
> 0..n
> 1..0
> 1 and only 1 ?
>
>
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who
>
> From: Oliveira, Emanuel <Em...@fmr.com>
> Sent: Friday 6 December 2019 10:15
> To: users@nifi.apache.org
> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>
> Hi Mark, forgot to share the NiFi version we using:
> 1.8.0
> 10/22/2018 23:48:30 EDT
> Tagged nifi-1.8.0-RC3
>
>
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who
>
> From: Emanuel Oliveira <em...@gmail.com>
> Sent: Thursday 5 December 2019 22:42
> To: users@nifi.apache.org
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>
> This email is from an external source - exercise caution regarding links and attachments.
>
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.
>
> Cheers,
> Emanuel
>
> On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
>
> Emanuel,
>
> What version of NiFi are you using?
>
> I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:
>
> Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]
>
>
>
>
> Thanks
> -Mark
>
>
>
>
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com> wrote:
>
> Hi all,
>
> I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> - using ValidateRecord > Allow Extra Fields
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>
> How can I make mandatory the Records array ? Is it possible ?
>
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?
>
>
>
> --OK - payload GOOD
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>
> --NOK - payload BAD 1 - missing "Records" array à BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "RecordsXXX": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersionXX": "aaa"
>       }
>    ]
> }
>
> Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
> <image001.png>
>
>
> Heres ValidateRecord processor + reader/writer controllers:
>
> Avro schema with just array “Records” and “eventVersion” as min tag on array element.
> Using Allow Extra Fields true:
>
> So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
>
> ==========================================================
>
> --ValidateRecord 1.8.0
> Record Reader                           JsonTreeReader
> Record Writer                           JsonRecordSetWriter
> Record Writer for Invalid Records
> Schema Access Strategy                  Use Reader's Schema
> Schema Registry                         No value set
> Schema Name                             ${schema.name}
> Schema Text                             ${avro.schema}
> Allow Extra Fields                      true
> Strict Type Checking                    true
>
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
> Schema Access Strategy                  Use 'Schema Text' Property
> Schema Registry
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text
>                                         {
>                                            "name": "MyName",
>                                            "type": "record",
>                                            "namespace": "aa.bb.cc",
>                                            "fields": [{
>                                                  "name": "Records",
>                                                  "type": {
>                                                     "type": "array",
>                                                     "items": {
>                                                        "name": "Records_record",
>                                                        "type": "record",
>                                                        "fields": [{
>                                                              "name": "eventVersion",
>                                                              "type": "string"
>                                                           }
>                                                        ]
>                                                     }
>                                                  }
>                                               }
>                                            ]
>                                         }
> Date Format
> Time Format
> Timestamp Format
>
> --JsonRecordSetWriter 1.8.0
> Schema Write Strategy                   Do Not Write Schema
> Schema Access Strategy                  Inherit Record Schema
> Schema Registry
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text                             { "name": "eventVersion", "type": "string" }
> Date Format
> Time Format
> Timestamp Format
> Pretty Print JSON                       true
> Suppress Null Values                    Never Suppress
> Output Grouping                         Array
>
> Thanks in advance,
> Emanuel Oliveira
>
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>
>

Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Mark Payne <ma...@hotmail.com>.
Emanuel,

Unfortunately, this is not something that I believe Avro schema supports, unfortunately. Avro schema is kept reasonably simple but doesn't provide much in the way of validation. It's really intended more to instruct serializers/deserializers how to work with the bytes.

I would love to get to the point that we are able to use XML Schemas (XSD) to form schemas, because XSD is very rich in their validation capabilities. That's a lot of work, though, and we're just not there yet.

Thanks
-Mark


> On Dec 19, 2019, at 9:16 AM, Emanuel Oliveira <em...@gmail.com> wrote:
> 
> Just additional thought on this, Im not sure if part of avro schema specification, but would be nice to be able to "inform" on the schema of cardinalities.
> For example by default specified records or fields must exist (cardinality 1..1), but in arrays, would be nice to be able to specify cardinality like:
> - 0..n -- can be empty (in this case either tag array must exist or not tbd ).
> - 1..n  -- at least 1 element needed
> - 1 and only element on the array (ie. [0]).
> 
> Best Regards,
> Emanuel Oliveira
> 
> 
> 
> On Thu, Dec 12, 2019 at 11:23 AM Oliveira, Emanuel <Emanuel.Oliveira@fmr.com <ma...@fmr.com>> wrote:
> Hi Juan and others,
> 
>  
> 
> Attaching reproducible test flow for your convenience.
> 
>  
> 
> Once again objective is to have 2 mandatory things on json:
> 
> 1 array “Records” in the root.
> and each element must have attribute eventVersion.
>  
> 
> Theres 3 generateFlowfiles to test the 3 different scenarios:
> 
> problem | missing array and FF still validates.
> Ok | array “Records” present but missing eventVersion. Invalid as expected.
> Ok | both mandatory things present array “Records” + “eventVersion”.
>  
> 
>  
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Juan Pablo Gardella <gardellajuanpablo@gmail.com <ma...@gmail.com>> 
> Sent: Wednesday 11 December 2019 16:18
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> This email is from an external source - exercise caution regarding links and attachments.
> 
>  
> 
> The bug https://issues.apache.org/jira/browse/NIFI-4893 <https://issues.apache.org/jira/browse/NIFI-4893> was detected by myself. Do you have a reproducible flow to validate it?
> 
>  
> 
> On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <Emanuel.Oliveira@fmr.com <ma...@fmr.com>> wrote:
> 
> Oh I see, makes, sense your analysis, but sorry I have done java 20 years ago, nowadays im mostly data engineer (oracle db, etl tools, custom migrations, snowflake and lately nifi).. so count on me to detect opportunities to improve things, but not able to change base code/tests.
> 
>  
> 
> Thanks so much for your time and analysis, lets wait for community to step up to do the fix and update/run the unit tests 😊
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Mark Payne <markap14@hotmail.com <ma...@hotmail.com>> 
> Sent: Wednesday 11 December 2019 15:25
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> This email is from an external source - exercise caution regarding links and attachments.
> 
>  
> 
> Emanuel,
> 
>  
> 
> I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1].  When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead.
> 
>  
> 
> It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it.
> 
>  
> 
> If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able. 
> 
>  
> 
> Thanks
> 
> -Mark
> 
>  
> 
>  
> 
> [1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631 <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631>
>  
> 
> [2] https://issues.apache.org/jira/browse/NIFI-4893 <https://issues.apache.org/jira/browse/NIFI-4893>
>  
> 
>  
> 
>  
> 
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Emanuel.Oliveira@fmr.com <ma...@fmr.com>> wrote:
> 
>  
> 
> Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?
> 
>  
> 
> There’s 2 options with:
> 
> ·         ValidateRecord.Allow Extra Fields=false à need to supply full schema
> ·         ValidateRecord.Allow Extra Fields=true à this is what I been testing/want, a way to supply schema with only mandatory fields.
>  
> 
> I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
> 
> { (..)
> 
>    "Records": [{
> 
>          "eventVersion": "aaa"
> 
>          (..)
> 
>       }
> 
>    ]
> 
>    (..)
> 
> }
> 
>  
> 
> Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
> }
> 
>  
> 
> IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.
> 
>  
> 
>  
> 
> So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
> 
> 0..n
> 
> 1..0
> 
> 1 and only 1 ?
> 
>  
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Oliveira, Emanuel <Emanuel.Oliveira@fmr.com <ma...@fmr.com>> 
> Sent: Friday 6 December 2019 10:15
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> Hi Mark, forgot to share the NiFi version we using:
> 
> 1.8.0
> 
> 10/22/2018 23:48:30 EDT
> 
> Tagged nifi-1.8.0-RC3
> 
>  
> 
>  
> 
> Thanks//Regards,
> 
> Emanuel Oliveira
> 
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who <http://fidelitycentral.fmr.com/ww/a639704>  
> 
>  
> 
> From: Emanuel Oliveira <emanueol@gmail.com <ma...@gmail.com>> 
> Sent: Thursday 5 December 2019 22:42
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
> 
>  
> 
> This email is from an external source - exercise caution regarding links and attachments.
> 
>  
> 
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.
> 
>  
> 
> Cheers,
> 
> Emanuel 
> 
>  
> 
> On Thu 5 Dec 2019, 22:03 Mark Payne, <markap14@hotmail.com <ma...@hotmail.com>> wrote:
> 
> Emanuel, 
> 
>  
> 
> What version of NiFi are you using?
> 
>  
> 
> I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:
> 
>  
> 
> Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]
> 
>  
> 
>  
> 
>  
> 
>  
> 
> Thanks
> 
> -Mark
> 
>  
> 
>  
> 
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Emanuel.Oliveira@fmr.com <ma...@fmr.com>> wrote:
> 
>  
> 
> Hi all,
> 
>  
> 
> I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
> 
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> 
> - using ValidateRecord > Allow Extra Fields
> 
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
> 
>  
> 
> How can I make mandatory the Records array ? Is it possible ?
> 
>  
> 
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?
> 
>  
> 
>  
> 
>  
> 
> --OK - payload GOOD
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
>    "Records": [{
> 
>          "eventVersion": "aaa"
> 
>       }
> 
>    ]
> 
> }
> 
>  
> 
> --NOK - payload BAD 1 - missing "Records" array à BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
>    "RecordsXXX": [{
> 
>          "eventVersion": "aaa"
> 
>       }
> 
>    ]
> 
> }
> 
>  
> 
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> 
> {
> 
>    "Service": "sssssss",
> 
>    "Event": "eeeee",
> 
>    "Time": "2019-11-25T16:21:53.280Z",
> 
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
> 
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
> 
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> 
>    "Records": [{
> 
>          "eventVersionXX": "aaa"
> 
>       }
> 
>    ]
> 
> }
> 
>  
> 
> Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
> 
> <image001.png>
> 
>  
> 
>  
> 
> Heres ValidateRecord processor + reader/writer controllers:
> 
> Avro schema with just array “Records” and “eventVersion” as min tag on array element.
> Using Allow Extra Fields true:
> So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
> ==========================================================
> 
>  
> 
> --ValidateRecord 1.8.0
> 
> Record Reader                           JsonTreeReader
> 
> Record Writer                           JsonRecordSetWriter
> 
> Record Writer for Invalid Records      
> 
> Schema Access Strategy                  Use Reader's Schema
> 
> Schema Registry                         No value set
> 
> Schema Name                             ${schema.name <http://schema.name/>}
> 
> Schema Text                             ${avro.schema}
> 
> Allow Extra Fields                      true
> 
> Strict Type Checking                    true
> 
>  
> 
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
> 
> Schema Access Strategy                  Use 'Schema Text' Property
> 
> Schema Registry                        
> 
> Schema Name                             ${schema.name <http://schema.name/>}
> 
> Schema Version                         
> 
> Schema Branch                          
> 
> Schema Text                            
> 
>                                         {
> 
>                                            "name": "MyName",
> 
>                                            "type": "record",
> 
>                                            "namespace": "aa.bb.cc <http://aa.bb.cc/>",
> 
>                                            "fields": [{
> 
>                                                  "name": "Records",
> 
>                                                  "type": {
> 
>                                                     "type": "array",
> 
>                                                     "items": {
> 
>                                                        "name": "Records_record",
> 
>                                                        "type": "record",
> 
>                                                        "fields": [{
> 
>                                                              "name": "eventVersion",
> 
>                                                              "type": "string"
> 
>                                                           }
> 
>                                                        ]
> 
>                                                     }
> 
>                                                  }
> 
>                                               }
> 
>                                            ]
> 
>                                         }
> 
> Date Format                            
> 
> Time Format
> 
> Timestamp Format
> 
>  
> 
> --JsonRecordSetWriter 1.8.0
> 
> Schema Write Strategy                   Do Not Write Schema
> 
> Schema Access Strategy                  Inherit Record Schema
> 
> Schema Registry                        
> 
> Schema Name                             ${schema.name <http://schema.name/>}
> 
> Schema Version
> 
> Schema Branch
> 
> Schema Text                             { "name": "eventVersion", "type": "string" }
> 
> Date Format
> 
> Time Format
> 
> Timestamp Format
> 
> Pretty Print JSON                       true
> 
> Suppress Null Values                    Never Suppress
> 
> Output Grouping                         Array
> 
>  
> 
> Thanks in advance,
> 
> Emanuel Oliveira
> 
>  
> 
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
> 
>  
> 


Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Emanuel Oliveira <em...@gmail.com>.
Just additional thought on this, Im not sure if part of avro schema
specification, but would be nice to be able to "inform" on the schema of
cardinalities.
For example by default specified records or fields must exist (cardinality
1..1), but in arrays, would be nice to be able to specify cardinality like:
- 0..n -- can be empty (in this case either tag array must exist or not tbd
).
- 1..n  -- at least 1 element needed
- 1 and only element on the array (ie. [0]).

Best Regards,
*Emanuel Oliveira*



On Thu, Dec 12, 2019 at 11:23 AM Oliveira, Emanuel <Em...@fmr.com>
wrote:

> Hi Juan and others,
>
>
>
> Attaching reproducible test flow for your convenience.
>
>
>
> Once again objective is to have 2 mandatory things on json:
>
>    - 1 array “Records” in the root.
>    - and each element must have attribute eventVersion.
>
>
>
> Theres 3 generateFlowfiles to test the 3 different scenarios:
>
>    - problem | missing array and FF still validates.
>    - Ok | array “Records” present but missing eventVersion. Invalid as
>    expected.
>    - Ok | both mandatory things present array “Records” + “eventVersion”.
>
>
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Juan Pablo Gardella <ga...@gmail.com>
> *Sent:* Wednesday 11 December 2019 16:18
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments. *
>
>
>
> The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by
> myself. Do you have a reproducible flow to validate it?
>
>
>
> On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
> Oh I see, makes, sense your analysis, but sorry I have done java 20 years
> ago, nowadays im mostly data engineer (oracle db, etl tools, custom
> migrations, snowflake and lately nifi).. so count on me to detect
> opportunities to improve things, but not able to change base code/tests.
>
>
>
> Thanks so much for your time and analysis, lets wait for community to step
> up to do the fix and update/run the unit tests 😊
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Mark Payne <ma...@hotmail.com>
> *Sent:* Wednesday 11 December 2019 15:25
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments. *
>
>
>
> Emanuel,
>
>
>
> I looked into this a week or so ago, but haven't had a chance to resolve
> the issue yet. It does appear to be a bug. Specifically, I believe the bug
> is here [1].  When we create a RecordSchema from the Avro Schema, we set
> the default value for the array to an empty array, instead of null. Because
> of this, when the JSON is parsed, we end up creating a Record with an empty
> array for the "Record" field instead of a null. As as result, the Record is
> considered valid because it does have an array (it's just empty). I think
> it *should* be a null value instead.
>
>
>
> It looks like this was introduced in NIFI-4893 [2]. We can easily change
> it to just return a null value for the default, but that does result in two
> of the unit tests added in NIFI-4893 failing. It may be that those unit
> tests need to be fixed, or it may be that such a change does break
> something. I just haven't had a chance yet to dig that far into it.
>
>
>
> If you're someone who is comfortable digging into the code and making the
> updates, then please do and I'm happy to review a PR as soon as I'm able.
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>
>
>
> [2] https://issues.apache.org/jira/browse/NIFI-4893
>
>
>
>
>
>
>
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
>
>
> Anyway knowledgably on avro schemas can please confirm/suggest if this
> inability to invalidate json payload missing array in root when allowing
> extra field-true is normal ?
>
>
>
> There’s 2 options with:
>
> ·         ValidateRecord.Allow Extra Fields=false à need to supply full
> schema
>
> ·         ValidateRecord.Allow Extra Fields=true à this is what I been
> testing/want, a way to supply schema with only mandatory fields.
>
>
>
> I want 2 mandatory fields, an array with at least 1 element having
> eventVersion, so minimal json should be:
>
> { (..)
>
>    "Records": [{
>
>          "eventVersion": "aaa"
>
>          (..)
>
>       }
>
>    ]
>
>    (..)
>
> }
>
>
>
> Problem is ValidateRecord considers FF valid if missing “Records” array in
> the root!!!!
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
> }
>
>
>
> IF I supply the array “Records” then the schema correctly validates I need
> at least eventVersion on the array element record.
>
>
>
>
>
> So… maybe my question can be tuned to “is it possible on avro schema
> syntax to specify cardinalities like in a db e/r diagram where a relation
> can be one of the following:
>
> 0..n
>
> 1..0
>
> 1 and only 1 ?
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Oliveira, Emanuel <Em...@fmr.com>
> *Sent:* Friday 6 December 2019 10:15
> *To:* users@nifi.apache.org
> *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> Hi Mark, forgot to share the NiFi version we using:
>
> 1.8.0
>
> 10/22/2018 23:48:30 EDT
>
> Tagged nifi-1.8.0-RC3
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Emanuel Oliveira <em...@gmail.com>
> *Sent:* Thursday 5 December 2019 22:42
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments.*
>
>
>
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
> GenerateFlowfile as this is the problem.
>
>
>
> Cheers,
>
> Emanuel
>
>
>
> On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
>
> Emanuel,
>
>
>
> What version of NiFi are you using?
>
>
>
> I just tested the attached template against the latest, and the FlowFile
> was routed to 'invalid' with the explanation:
>
>
>
> Records in this FlowFile were invalid for the following reasons: The
> following 1 fields were missing: [[0]/Records/eventVersion]
>
>
>
>
>
>
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
>
>
> Hi all,
>
>
>
> I been struggling to find a way for ValidateRecord using Avro Schema to
> force mandatory the presence of an array on json payload, problem is if
> array “records” is missing Validate is considering FF valid ☹.
>
> --objective - Mandatory to have "Records array" with at least
> "eventVersion"
>
> - using ValidateRecord > Allow Extra Fields
>
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>
>
>
> How can I make mandatory the Records array ? Is it possible ?
>
>
>
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to
> rid off the ARRAY, and also to fial if array "Records" not present.. But I
> would like to have a clean solution using just avro schema, is this
> possible ?
>
>
>
>
>
>
>
> --OK - payload GOOD
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "Records": [{
>
>          "eventVersion": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> --NOK - payload BAD 1 - missing "Records" array à BUT
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
> “invalid” since is not compliant to my avro schema which needs array
> “Records” with element “eventVersion” as 2 mandatory things.
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "RecordsXXX": [{
>
>          "eventVersion": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "Records": [{
>
>          "eventVersionXX": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> Its very simple test flow (attachmed the xml template
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
> ValidateRecord with JsonReader/Json Writer:
>
> <image001.png>
>
>
>
>
>
> Heres ValidateRecord processor + reader/writer controllers:
>
>    - Avro schema with just array “Records” and “eventVersion” as min tag
>    on array element.
>    - Using Allow Extra Fields true:
>
>
>    - So im ok having other fields on the root side by side with the array
>       “Records”, and also ok to have extra elements inside each array.
>       - FYI: the real use case im trying to validate AWS SQS message (s3
>       trigger) where I will be interested on several fields, but crafted this
>       simpler example just to ask if its possible to force array to be mandatory
>       and with at least 1 element ?
>
> ==========================================================
>
>
>
> --ValidateRecord 1.8.0
>
> Record Reader                           JsonTreeReader
>
> Record Writer                           JsonRecordSetWriter
>
> Record Writer for Invalid Records
>
> Schema Access Strategy                  Use Reader's Schema
>
> Schema Registry                         No value set
>
> Schema Name                             ${schema.name}
>
> Schema Text                             ${avro.schema}
>
> Allow Extra Fields                      true
>
> Strict Type Checking                    true
>
>
>
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
> "eventVersion" on each ARRAY element
>
> Schema Access Strategy                  Use 'Schema Text' Property
>
> Schema Registry
>
> Schema Name                             ${schema.name}
>
> Schema Version
>
> Schema Branch
>
> Schema Text
>
>                                         {
>
>                                            "name": "MyName",
>
>                                            "type": "record",
>
>                                            "namespace": "aa.bb.cc",
>
>                                            "fields": [{
>
>                                                  "name": "Records",
>
>                                                  "type": {
>
>                                                     "type": "array",
>
>                                                     "items": {
>
>                                                        "name":
> "Records_record",
>
>                                                        "type": "record",
>
>                                                        "fields": [{
>
>                                                              "name":
> "eventVersion",
>
>                                                              "type":
> "string"
>
>                                                           }
>
>                                                        ]
>
>                                                     }
>
>                                                  }
>
>                                               }
>
>                                            ]
>
>                                         }
>
> Date Format
>
> Time Format
>
> Timestamp Format
>
>
>
> --JsonRecordSetWriter 1.8.0
>
> Schema Write Strategy                   Do Not Write Schema
>
> Schema Access Strategy                  Inherit Record Schema
>
> Schema Registry
>
> Schema Name                             ${schema.name}
>
> Schema Version
>
> Schema Branch
>
> Schema Text                             { "name": "eventVersion", "type":
> "string" }
>
> Date Format
>
> Time Format
>
> Timestamp Format
>
> Pretty Print JSON                       true
>
> Suppress Null Values                    Never Suppress
>
> Output Grouping                         Array
>
>
>
> Thanks in advance,
>
> Emanuel Oliveira
>
>
>
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>
>
>
>

RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by "Oliveira, Emanuel" <Em...@fmr.com>.
Hi Juan and others,

Attaching reproducible test flow for your convenience.

Once again objective is to have 2 mandatory things on json:

  *   1 array “Records” in the root.
  *   and each element must have attribute eventVersion.

Theres 3 generateFlowfiles to test the 3 different scenarios:

  *   problem | missing array and FF still validates.
  *   Ok | array “Records” present but missing eventVersion. Invalid as expected.
  *   Ok | both mandatory things present array “Records” + “eventVersion”.



Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Juan Pablo Gardella <ga...@gmail.com>
Sent: Wednesday 11 December 2019 16:18
To: users@nifi.apache.org
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by myself. Do you have a reproducible flow to validate it?

On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <Em...@fmr.com>> wrote:
Oh I see, makes, sense your analysis, but sorry I have done java 20 years ago, nowadays im mostly data engineer (oracle db, etl tools, custom migrations, snowflake and lately nifi).. so count on me to detect opportunities to improve things, but not able to change base code/tests.

Thanks so much for your time and analysis, lets wait for community to step up to do the fix and update/run the unit tests 😊

Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Mark Payne <ma...@hotmail.com>>
Sent: Wednesday 11 December 2019 15:25
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

Emanuel,

I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1].  When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead.

It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it.

If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able.

Thanks
-Mark


[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631

[2] https://issues.apache.org/jira/browse/NIFI-4893



On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com>> wrote:

Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?

There’s 2 options with:

•         ValidateRecord.Allow Extra Fields=false --> need to supply full schema

•         ValidateRecord.Allow Extra Fields=true --> this is what I been testing/want, a way to supply schema with only mandatory fields.

I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
{ (..)
   "Records": [{
         "eventVersion": "aaa"
         (..)
      }
   ]
   (..)
}

Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
}

IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.


So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
0..n
1..0
1 and only 1 ?


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Oliveira, Emanuel <Em...@fmr.com>>
Sent: Friday 6 December 2019 10:15
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Hi Mark, forgot to share the NiFi version we using:
1.8.0
10/22/2018 23:48:30 EDT
Tagged nifi-1.8.0-RC3


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Emanuel Oliveira <em...@gmail.com>>
Sent: Thursday 5 December 2019 22:42
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.

Cheers,
Emanuel

On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com>> wrote:
Emanuel,

What version of NiFi are you using?

I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:

Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]




Thanks
-Mark


On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>> wrote:

Hi all,

I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
--objective - Mandatory to have "Records array" with at least "eventVersion"
- using ValidateRecord > Allow Extra Fields
- problem im facing is nifi dont trigger payload BAD 1 as invalid!!

How can I make mandatory the Records array ? Is it possible ?

I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?



--OK - payload GOOD
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersion": "aaa"
      }
   ]
}

--NOK - payload BAD 1 - missing "Records" array --> BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "RecordsXXX": [{
         "eventVersion": "aaa"
      }
   ]
}

--OK - payload BAD 2 - "Records" array present but missing "eventVersion"
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersionXX": "aaa"
      }
   ]
}

Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
<image001.png>


Heres ValidateRecord processor + reader/writer controllers:

  *   Avro schema with just array “Records” and “eventVersion” as min tag on array element.
  *   Using Allow Extra Fields true:

     *   So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
     *   FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
==========================================================

--ValidateRecord 1.8.0
Record Reader                           JsonTreeReader
Record Writer                           JsonRecordSetWriter
Record Writer for Invalid Records
Schema Access Strategy                  Use Reader's Schema
Schema Registry                         No value set
Schema Name                             ${schema.name<http://schema.name/>}
Schema Text                             ${avro.schema}
Allow Extra Fields                      true
Strict Type Checking                    true

--JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
Schema Access Strategy                  Use 'Schema Text' Property
Schema Registry
Schema Name                             ${schema.name<http://schema.name/>}
Schema Version
Schema Branch
Schema Text
                                        {
                                           "name": "MyName",
                                           "type": "record",
                                           "namespace": "aa.bb.cc<http://aa.bb.cc/>",
                                           "fields": [{
                                                 "name": "Records",
                                                 "type": {
                                                    "type": "array",
                                                    "items": {
                                                       "name": "Records_record",
                                                       "type": "record",
                                                       "fields": [{
                                                             "name": "eventVersion",
                                                             "type": "string"
                                                          }
                                                       ]
                                                    }
                                                 }
                                              }
                                           ]
                                        }
Date Format
Time Format
Timestamp Format

--JsonRecordSetWriter 1.8.0
Schema Write Strategy                   Do Not Write Schema
Schema Access Strategy                  Inherit Record Schema
Schema Registry
Schema Name                             ${schema.name<http://schema.name/>}
Schema Version
Schema Branch
Schema Text                             { "name": "eventVersion", "type": "string" }
Date Format
Time Format
Timestamp Format
Pretty Print JSON                       true
Suppress Null Values                    Never Suppress
Output Grouping                         Array

Thanks in advance,
Emanuel Oliveira

<ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>


Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Matt Burgess <ma...@apache.org>.
I wonder if this bug is related to the SO question [1] as well?

[1]
https://stackoverflow.com/questions/58482448/nifi-validate-record-of-nested-json-set-valid-for-missing-array-field


On Wed, Dec 11, 2019 at 11:18 AM Juan Pablo Gardella <
gardellajuanpablo@gmail.com> wrote:

> The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by
> myself. Do you have a reproducible flow to validate it?
>
> On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
>> Oh I see, makes, sense your analysis, but sorry I have done java 20 years
>> ago, nowadays im mostly data engineer (oracle db, etl tools, custom
>> migrations, snowflake and lately nifi).. so count on me to detect
>> opportunities to improve things, but not able to change base code/tests.
>>
>>
>>
>> Thanks so much for your time and analysis, lets wait for community to
>> step up to do the fix and update/run the unit tests 😊
>>
>>
>>
>> Thanks//Regards,
>>
>> *Emanuel Oliveira*
>>
>> Senior Oracle/Data Engineer | CTG | Galway
>> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
>> <http://fidelitycentral.fmr.com/ww/a639704>
>>
>>
>>
>> *From:* Mark Payne <ma...@hotmail.com>
>> *Sent:* Wednesday 11 December 2019 15:25
>> *To:* users@nifi.apache.org
>> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>>
>>
>> *This email is from an external source - **exercise caution regarding
>> links and attachments. *
>>
>>
>>
>> Emanuel,
>>
>>
>>
>> I looked into this a week or so ago, but haven't had a chance to resolve
>> the issue yet. It does appear to be a bug. Specifically, I believe the bug
>> is here [1].  When we create a RecordSchema from the Avro Schema, we set
>> the default value for the array to an empty array, instead of null. Because
>> of this, when the JSON is parsed, we end up creating a Record with an empty
>> array for the "Record" field instead of a null. As as result, the Record is
>> considered valid because it does have an array (it's just empty). I think
>> it *should* be a null value instead.
>>
>>
>>
>> It looks like this was introduced in NIFI-4893 [2]. We can easily change
>> it to just return a null value for the default, but that does result in two
>> of the unit tests added in NIFI-4893 failing. It may be that those unit
>> tests need to be fixed, or it may be that such a change does break
>> something. I just haven't had a chance yet to dig that far into it.
>>
>>
>>
>> If you're someone who is comfortable digging into the code and making the
>> updates, then please do and I'm happy to review a PR as soon as I'm able.
>>
>>
>>
>> Thanks
>>
>> -Mark
>>
>>
>>
>>
>>
>> [1]
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>>
>>
>>
>> [2] https://issues.apache.org/jira/browse/NIFI-4893
>>
>>
>>
>>
>>
>>
>>
>> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com>
>> wrote:
>>
>>
>>
>> Anyway knowledgably on avro schemas can please confirm/suggest if this
>> inability to invalidate json payload missing array in root when allowing
>> extra field-true is normal ?
>>
>>
>>
>> There’s 2 options with:
>>
>> ·         ValidateRecord.Allow Extra Fields=false à need to supply full
>> schema
>>
>> ·         ValidateRecord.Allow Extra Fields=true à this is what I been
>> testing/want, a way to supply schema with only mandatory fields.
>>
>>
>>
>> I want 2 mandatory fields, an array with at least 1 element having
>> eventVersion, so minimal json should be:
>>
>> { (..)
>>
>>    "Records": [{
>>
>>          "eventVersion": "aaa"
>>
>>          (..)
>>
>>       }
>>
>>    ]
>>
>>    (..)
>>
>> }
>>
>>
>>
>> Problem is ValidateRecord considers FF valid if missing “Records” array
>> in the root!!!!
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>> }
>>
>>
>>
>> IF I supply the array “Records” then the schema correctly validates I
>> need at least eventVersion on the array element record.
>>
>>
>>
>>
>>
>> So… maybe my question can be tuned to “is it possible on avro schema
>> syntax to specify cardinalities like in a db e/r diagram where a relation
>> can be one of the following:
>>
>> 0..n
>>
>> 1..0
>>
>> 1 and only 1 ?
>>
>>
>>
>>
>>
>> Thanks//Regards,
>>
>> *Emanuel Oliveira*
>>
>> Senior Oracle/Data Engineer | CTG | Galway
>> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
>> <http://fidelitycentral.fmr.com/ww/a639704>
>>
>>
>>
>> *From:* Oliveira, Emanuel <Em...@fmr.com>
>> *Sent:* Friday 6 December 2019 10:15
>> *To:* users@nifi.apache.org
>> *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>>
>>
>> Hi Mark, forgot to share the NiFi version we using:
>>
>> 1.8.0
>>
>> 10/22/2018 23:48:30 EDT
>>
>> Tagged nifi-1.8.0-RC3
>>
>>
>>
>>
>>
>> Thanks//Regards,
>>
>> *Emanuel Oliveira*
>>
>> Senior Oracle/Data Engineer | CTG | Galway
>> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
>> <http://fidelitycentral.fmr.com/ww/a639704>
>>
>>
>>
>> *From:* Emanuel Oliveira <em...@gmail.com>
>> *Sent:* Thursday 5 December 2019 22:42
>> *To:* users@nifi.apache.org
>> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
>> ARRAY ?
>>
>>
>>
>> *This email is from an external source - **exercise caution regarding
>> links and attachments.*
>>
>>
>>
>> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
>> GenerateFlowfile as this is the problem.
>>
>>
>>
>> Cheers,
>>
>> Emanuel
>>
>>
>>
>> On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
>>
>> Emanuel,
>>
>>
>>
>> What version of NiFi are you using?
>>
>>
>>
>> I just tested the attached template against the latest, and the FlowFile
>> was routed to 'invalid' with the explanation:
>>
>>
>>
>> Records in this FlowFile were invalid for the following reasons: The
>> following 1 fields were missing: [[0]/Records/eventVersion]
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks
>>
>> -Mark
>>
>>
>>
>>
>>
>> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>
>> wrote:
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I been struggling to find a way for ValidateRecord using Avro Schema to
>> force mandatory the presence of an array on json payload, problem is if
>> array “records” is missing Validate is considering FF valid ☹.
>>
>> --objective - Mandatory to have "Records array" with at least
>> "eventVersion"
>>
>> - using ValidateRecord > Allow Extra Fields
>>
>> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>>
>>
>>
>> How can I make mandatory the Records array ? Is it possible ?
>>
>>
>>
>> I know I can eventually use a SplitJson JsonPath Expression=$.Records to
>> rid off the ARRAY, and also to fial if array "Records" not present.. But I
>> would like to have a clean solution using just avro schema, is this
>> possible ?
>>
>>
>>
>>
>>
>>
>>
>> --OK - payload GOOD
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>>    "Records": [{
>>
>>          "eventVersion": "aaa"
>>
>>       }
>>
>>    ]
>>
>> }
>>
>>
>>
>> --NOK - payload BAD 1 - missing "Records" array à BUT
>> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
>> “invalid” since is not compliant to my avro schema which needs array
>> “Records” with element “eventVersion” as 2 mandatory things.
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>>    "RecordsXXX": [{
>>
>>          "eventVersion": "aaa"
>>
>>       }
>>
>>    ]
>>
>> }
>>
>>
>>
>> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
>>
>> {
>>
>>    "Service": "sssssss",
>>
>>    "Event": "eeeee",
>>
>>    "Time": "2019-11-25T16:21:53.280Z",
>>
>>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>>
>>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>>
>>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>>
>>    "Records": [{
>>
>>          "eventVersionXX": "aaa"
>>
>>       }
>>
>>    ]
>>
>> }
>>
>>
>>
>> Its very simple test flow (attachmed the xml template
>> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
>> ValidateRecord with JsonReader/Json Writer:
>>
>> <image001.png>
>>
>>
>>
>>
>>
>> Heres ValidateRecord processor + reader/writer controllers:
>>
>>    - Avro schema with just array “Records” and “eventVersion” as min tag
>>    on array element.
>>    - Using Allow Extra Fields true:
>>
>>
>>    - So im ok having other fields on the root side by side with the
>>       array “Records”, and also ok to have extra elements inside each array.
>>       - FYI: the real use case im trying to validate AWS SQS message (s3
>>       trigger) where I will be interested on several fields, but crafted this
>>       simpler example just to ask if its possible to force array to be mandatory
>>       and with at least 1 element ?
>>
>> ==========================================================
>>
>>
>>
>> --ValidateRecord 1.8.0
>>
>> Record Reader                           JsonTreeReader
>>
>> Record Writer                           JsonRecordSetWriter
>>
>> Record Writer for Invalid Records
>>
>> Schema Access Strategy                  Use Reader's Schema
>>
>> Schema Registry                         No value set
>>
>> Schema Name                             ${schema.name}
>>
>> Schema Text                             ${avro.schema}
>>
>> Allow Extra Fields                      true
>>
>> Strict Type Checking                    true
>>
>>
>>
>> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
>> "eventVersion" on each ARRAY element
>>
>> Schema Access Strategy                  Use 'Schema Text' Property
>>
>> Schema Registry
>>
>> Schema Name                             ${schema.name}
>>
>> Schema Version
>>
>> Schema Branch
>>
>> Schema Text
>>
>>                                         {
>>
>>                                            "name": "MyName",
>>
>>                                            "type": "record",
>>
>>                                            "namespace": "aa.bb.cc",
>>
>>                                            "fields": [{
>>
>>                                                  "name": "Records",
>>
>>                                                  "type": {
>>
>>                                                     "type": "array",
>>
>>                                                     "items": {
>>
>>                                                        "name":
>> "Records_record",
>>
>>                                                        "type": "record",
>>
>>                                                        "fields": [{
>>
>>                                                              "name":
>> "eventVersion",
>>
>>                                                              "type":
>> "string"
>>
>>                                                           }
>>
>>                                                        ]
>>
>>                                                     }
>>
>>                                                  }
>>
>>                                               }
>>
>>                                            ]
>>
>>                                         }
>>
>> Date Format
>>
>> Time Format
>>
>> Timestamp Format
>>
>>
>>
>> --JsonRecordSetWriter 1.8.0
>>
>> Schema Write Strategy                   Do Not Write Schema
>>
>> Schema Access Strategy                  Inherit Record Schema
>>
>> Schema Registry
>>
>> Schema Name                             ${schema.name}
>>
>> Schema Version
>>
>> Schema Branch
>>
>> Schema Text                             { "name": "eventVersion", "type":
>> "string" }
>>
>> Date Format
>>
>> Time Format
>>
>> Timestamp Format
>>
>> Pretty Print JSON                       true
>>
>> Suppress Null Values                    Never Suppress
>>
>> Output Grouping                         Array
>>
>>
>>
>> Thanks in advance,
>>
>> Emanuel Oliveira
>>
>>
>>
>> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>>
>>
>>
>

Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Juan Pablo Gardella <ga...@gmail.com>.
The bug https://issues.apache.org/jira/browse/NIFI-4893 was detected by
myself. Do you have a reproducible flow to validate it?

On Wed, 11 Dec 2019 at 12:54, Oliveira, Emanuel <Em...@fmr.com>
wrote:

> Oh I see, makes, sense your analysis, but sorry I have done java 20 years
> ago, nowadays im mostly data engineer (oracle db, etl tools, custom
> migrations, snowflake and lately nifi).. so count on me to detect
> opportunities to improve things, but not able to change base code/tests.
>
>
>
> Thanks so much for your time and analysis, lets wait for community to step
> up to do the fix and update/run the unit tests 😊
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Mark Payne <ma...@hotmail.com>
> *Sent:* Wednesday 11 December 2019 15:25
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments. *
>
>
>
> Emanuel,
>
>
>
> I looked into this a week or so ago, but haven't had a chance to resolve
> the issue yet. It does appear to be a bug. Specifically, I believe the bug
> is here [1].  When we create a RecordSchema from the Avro Schema, we set
> the default value for the array to an empty array, instead of null. Because
> of this, when the JSON is parsed, we end up creating a Record with an empty
> array for the "Record" field instead of a null. As as result, the Record is
> considered valid because it does have an array (it's just empty). I think
> it *should* be a null value instead.
>
>
>
> It looks like this was introduced in NIFI-4893 [2]. We can easily change
> it to just return a null value for the default, but that does result in two
> of the unit tests added in NIFI-4893 failing. It may be that those unit
> tests need to be fixed, or it may be that such a change does break
> something. I just haven't had a chance yet to dig that far into it.
>
>
>
> If you're someone who is comfortable digging into the code and making the
> updates, then please do and I'm happy to review a PR as soon as I'm able.
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631
>
>
>
> [2] https://issues.apache.org/jira/browse/NIFI-4893
>
>
>
>
>
>
>
> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
>
>
> Anyway knowledgably on avro schemas can please confirm/suggest if this
> inability to invalidate json payload missing array in root when allowing
> extra field-true is normal ?
>
>
>
> There’s 2 options with:
>
> ·         ValidateRecord.Allow Extra Fields=false à need to supply full
> schema
>
> ·         ValidateRecord.Allow Extra Fields=true à this is what I been
> testing/want, a way to supply schema with only mandatory fields.
>
>
>
> I want 2 mandatory fields, an array with at least 1 element having
> eventVersion, so minimal json should be:
>
> { (..)
>
>    "Records": [{
>
>          "eventVersion": "aaa"
>
>          (..)
>
>       }
>
>    ]
>
>    (..)
>
> }
>
>
>
> Problem is ValidateRecord considers FF valid if missing “Records” array in
> the root!!!!
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
> }
>
>
>
> IF I supply the array “Records” then the schema correctly validates I need
> at least eventVersion on the array element record.
>
>
>
>
>
> So… maybe my question can be tuned to “is it possible on avro schema
> syntax to specify cardinalities like in a db e/r diagram where a relation
> can be one of the following:
>
> 0..n
>
> 1..0
>
> 1 and only 1 ?
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Oliveira, Emanuel <Em...@fmr.com>
> *Sent:* Friday 6 December 2019 10:15
> *To:* users@nifi.apache.org
> *Subject:* RE: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> Hi Mark, forgot to share the NiFi version we using:
>
> 1.8.0
>
> 10/22/2018 23:48:30 EDT
>
> Tagged nifi-1.8.0-RC3
>
>
>
>
>
> Thanks//Regards,
>
> *Emanuel Oliveira*
>
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 *|*  who's who
> <http://fidelitycentral.fmr.com/ww/a639704>
>
>
>
> *From:* Emanuel Oliveira <em...@gmail.com>
> *Sent:* Thursday 5 December 2019 22:42
> *To:* users@nifi.apache.org
> *Subject:* Re: NiFi ValidateRecord - unable to handle missing mandatory
> ARRAY ?
>
>
>
> *This email is from an external source - **exercise caution regarding
> links and attachments.*
>
>
>
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
> GenerateFlowfile as this is the problem.
>
>
>
> Cheers,
>
> Emanuel
>
>
>
> On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:
>
> Emanuel,
>
>
>
> What version of NiFi are you using?
>
>
>
> I just tested the attached template against the latest, and the FlowFile
> was routed to 'invalid' with the explanation:
>
>
>
> Records in this FlowFile were invalid for the following reasons: The
> following 1 fields were missing: [[0]/Records/eventVersion]
>
>
>
>
>
>
>
>
>
> Thanks
>
> -Mark
>
>
>
>
>
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
>
>
> Hi all,
>
>
>
> I been struggling to find a way for ValidateRecord using Avro Schema to
> force mandatory the presence of an array on json payload, problem is if
> array “records” is missing Validate is considering FF valid ☹.
>
> --objective - Mandatory to have "Records array" with at least
> "eventVersion"
>
> - using ValidateRecord > Allow Extra Fields
>
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>
>
>
> How can I make mandatory the Records array ? Is it possible ?
>
>
>
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to
> rid off the ARRAY, and also to fial if array "Records" not present.. But I
> would like to have a clean solution using just avro schema, is this
> possible ?
>
>
>
>
>
>
>
> --OK - payload GOOD
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "Records": [{
>
>          "eventVersion": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> --NOK - payload BAD 1 - missing "Records" array à BUT
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
> “invalid” since is not compliant to my avro schema which needs array
> “Records” with element “eventVersion” as 2 mandatory things.
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "RecordsXXX": [{
>
>          "eventVersion": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
>
> {
>
>    "Service": "sssssss",
>
>    "Event": "eeeee",
>
>    "Time": "2019-11-25T16:21:53.280Z",
>
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>
>    "Records": [{
>
>          "eventVersionXX": "aaa"
>
>       }
>
>    ]
>
> }
>
>
>
> Its very simple test flow (attachmed the xml template
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
> ValidateRecord with JsonReader/Json Writer:
>
> <image001.png>
>
>
>
>
>
> Heres ValidateRecord processor + reader/writer controllers:
>
>    - Avro schema with just array “Records” and “eventVersion” as min tag
>    on array element.
>    - Using Allow Extra Fields true:
>
>
>    - So im ok having other fields on the root side by side with the array
>       “Records”, and also ok to have extra elements inside each array.
>       - FYI: the real use case im trying to validate AWS SQS message (s3
>       trigger) where I will be interested on several fields, but crafted this
>       simpler example just to ask if its possible to force array to be mandatory
>       and with at least 1 element ?
>
> ==========================================================
>
>
>
> --ValidateRecord 1.8.0
>
> Record Reader                           JsonTreeReader
>
> Record Writer                           JsonRecordSetWriter
>
> Record Writer for Invalid Records
>
> Schema Access Strategy                  Use Reader's Schema
>
> Schema Registry                         No value set
>
> Schema Name                             ${schema.name}
>
> Schema Text                             ${avro.schema}
>
> Allow Extra Fields                      true
>
> Strict Type Checking                    true
>
>
>
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
> "eventVersion" on each ARRAY element
>
> Schema Access Strategy                  Use 'Schema Text' Property
>
> Schema Registry
>
> Schema Name                             ${schema.name}
>
> Schema Version
>
> Schema Branch
>
> Schema Text
>
>                                         {
>
>                                            "name": "MyName",
>
>                                            "type": "record",
>
>                                            "namespace": "aa.bb.cc",
>
>                                            "fields": [{
>
>                                                  "name": "Records",
>
>                                                  "type": {
>
>                                                     "type": "array",
>
>                                                     "items": {
>
>                                                        "name":
> "Records_record",
>
>                                                        "type": "record",
>
>                                                        "fields": [{
>
>                                                              "name":
> "eventVersion",
>
>                                                              "type":
> "string"
>
>                                                           }
>
>                                                        ]
>
>                                                     }
>
>                                                  }
>
>                                               }
>
>                                            ]
>
>                                         }
>
> Date Format
>
> Time Format
>
> Timestamp Format
>
>
>
> --JsonRecordSetWriter 1.8.0
>
> Schema Write Strategy                   Do Not Write Schema
>
> Schema Access Strategy                  Inherit Record Schema
>
> Schema Registry
>
> Schema Name                             ${schema.name}
>
> Schema Version
>
> Schema Branch
>
> Schema Text                             { "name": "eventVersion", "type":
> "string" }
>
> Date Format
>
> Time Format
>
> Timestamp Format
>
> Pretty Print JSON                       true
>
> Suppress Null Values                    Never Suppress
>
> Output Grouping                         Array
>
>
>
> Thanks in advance,
>
> Emanuel Oliveira
>
>
>
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>
>
>

RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by "Oliveira, Emanuel" <Em...@fmr.com>.
Oh I see, makes, sense your analysis, but sorry I have done java 20 years ago, nowadays im mostly data engineer (oracle db, etl tools, custom migrations, snowflake and lately nifi).. so count on me to detect opportunities to improve things, but not able to change base code/tests.

Thanks so much for your time and analysis, lets wait for community to step up to do the fix and update/run the unit tests 😊

Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Mark Payne <ma...@hotmail.com>
Sent: Wednesday 11 December 2019 15:25
To: users@nifi.apache.org
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

Emanuel,

I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1].  When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead.

It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it.

If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able.

Thanks
-Mark


[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631

[2] https://issues.apache.org/jira/browse/NIFI-4893




On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com>> wrote:

Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?

There’s 2 options with:

·         ValidateRecord.Allow Extra Fields=false --> need to supply full schema

·         ValidateRecord.Allow Extra Fields=true --> this is what I been testing/want, a way to supply schema with only mandatory fields.

I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
{ (..)
   "Records": [{
         "eventVersion": "aaa"
         (..)
      }
   ]
   (..)
}

Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
}

IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.


So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
0..n
1..0
1 and only 1 ?


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Oliveira, Emanuel <Em...@fmr.com>>
Sent: Friday 6 December 2019 10:15
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Hi Mark, forgot to share the NiFi version we using:
1.8.0
10/22/2018 23:48:30 EDT
Tagged nifi-1.8.0-RC3


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Emanuel Oliveira <em...@gmail.com>>
Sent: Thursday 5 December 2019 22:42
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.

Cheers,
Emanuel

On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com>> wrote:
Emanuel,

What version of NiFi are you using?

I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:

Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]




Thanks
-Mark


On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>> wrote:

Hi all,

I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
--objective - Mandatory to have "Records array" with at least "eventVersion"
- using ValidateRecord > Allow Extra Fields
- problem im facing is nifi dont trigger payload BAD 1 as invalid!!

How can I make mandatory the Records array ? Is it possible ?

I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?



--OK - payload GOOD
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersion": "aaa"
      }
   ]
}

--NOK - payload BAD 1 - missing "Records" array --> BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "RecordsXXX": [{
         "eventVersion": "aaa"
      }
   ]
}

--OK - payload BAD 2 - "Records" array present but missing "eventVersion"
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersionXX": "aaa"
      }
   ]
}

Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
<image001.png>


Heres ValidateRecord processor + reader/writer controllers:

  *   Avro schema with just array “Records” and “eventVersion” as min tag on array element.
  *   Using Allow Extra Fields true:

     *   So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
     *   FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
==========================================================

--ValidateRecord 1.8.0
Record Reader                           JsonTreeReader
Record Writer                           JsonRecordSetWriter
Record Writer for Invalid Records
Schema Access Strategy                  Use Reader's Schema
Schema Registry                         No value set
Schema Name                             ${schema.name<http://schema.name/>}
Schema Text                             ${avro.schema}
Allow Extra Fields                      true
Strict Type Checking                    true

--JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
Schema Access Strategy                  Use 'Schema Text' Property
Schema Registry
Schema Name                             ${schema.name<http://schema.name/>}
Schema Version
Schema Branch
Schema Text
                                        {
                                           "name": "MyName",
                                           "type": "record",
                                           "namespace": "aa.bb.cc<http://aa.bb.cc/>",
                                           "fields": [{
                                                 "name": "Records",
                                                 "type": {
                                                    "type": "array",
                                                    "items": {
                                                       "name": "Records_record",
                                                       "type": "record",
                                                       "fields": [{
                                                             "name": "eventVersion",
                                                             "type": "string"
                                                          }
                                                       ]
                                                    }
                                                 }
                                              }
                                           ]
                                        }
Date Format
Time Format
Timestamp Format

--JsonRecordSetWriter 1.8.0
Schema Write Strategy                   Do Not Write Schema
Schema Access Strategy                  Inherit Record Schema
Schema Registry
Schema Name                             ${schema.name<http://schema.name/>}
Schema Version
Schema Branch
Schema Text                             { "name": "eventVersion", "type": "string" }
Date Format
Time Format
Timestamp Format
Pretty Print JSON                       true
Suppress Null Values                    Never Suppress
Output Grouping                         Array

Thanks in advance,
Emanuel Oliveira

<ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>


Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Mark Payne <ma...@hotmail.com>.
Emanuel,

I looked into this a week or so ago, but haven't had a chance to resolve the issue yet. It does appear to be a bug. Specifically, I believe the bug is here [1].  When we create a RecordSchema from the Avro Schema, we set the default value for the array to an empty array, instead of null. Because of this, when the JSON is parsed, we end up creating a Record with an empty array for the "Record" field instead of a null. As as result, the Record is considered valid because it does have an array (it's just empty). I think it *should* be a null value instead.

It looks like this was introduced in NIFI-4893 [2]. We can easily change it to just return a null value for the default, but that does result in two of the unit tests added in NIFI-4893 failing. It may be that those unit tests need to be fixed, or it may be that such a change does break something. I just haven't had a chance yet to dig that far into it.

If you're someone who is comfortable digging into the code and making the updates, then please do and I'm happy to review a PR as soon as I'm able. 

Thanks
-Mark


[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631 <https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-avro-record-utils/src/main/java/org/apache/nifi/avro/AvroTypeUtil.java#L629-L631>

[2] https://issues.apache.org/jira/browse/NIFI-4893 <https://issues.apache.org/jira/browse/NIFI-4893>



> On Dec 11, 2019, at 8:02 AM, Oliveira, Emanuel <Em...@fmr.com> wrote:
> 
> Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?
>  
> There’s 2 options with:
> ValidateRecord.Allow Extra Fields=false à need to supply full schema
> ValidateRecord.Allow Extra Fields=true à this is what I been testing/want, a way to supply schema with only mandatory fields.
>  
> I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
> { (..)
>    "Records": [{
>          "eventVersion": "aaa"
>          (..)
>       }
>    ]
>    (..)
> }
>  
> Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
> }
>  
> IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.
>  
>  
> So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
> 0..n
> 1..0
> 1 and only 1 ?
>  
>  
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who <http://fidelitycentral.fmr.com/ww/a639704>  
>  
> From: Oliveira, Emanuel <Em...@fmr.com> 
> Sent: Friday 6 December 2019 10:15
> To: users@nifi.apache.org
> Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>  
> Hi Mark, forgot to share the NiFi version we using:
> 1.8.0
> 10/22/2018 23:48:30 EDT
> Tagged nifi-1.8.0-RC3
>  
>  
> Thanks//Regards,
> Emanuel Oliveira
> Senior Oracle/Data Engineer | CTG | Galway
> TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who <http://fidelitycentral.fmr.com/ww/a639704>  
>  
> From: Emanuel Oliveira <emanueol@gmail.com <ma...@gmail.com>> 
> Sent: Thursday 5 December 2019 22:42
> To: users@nifi.apache.org <ma...@nifi.apache.org>
> Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?
>  
> This email is from an external source - exercise caution regarding links and attachments.
>  
> Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.
>  
> Cheers,
> Emanuel 
>  
> On Thu 5 Dec 2019, 22:03 Mark Payne, <markap14@hotmail.com <ma...@hotmail.com>> wrote:
> Emanuel, 
>  
> What version of NiFi are you using?
>  
> I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:
>  
> Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]
>  
>  
>  
>  
> Thanks
> -Mark
>  
>  
> 
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Emanuel.Oliveira@fmr.com <ma...@fmr.com>> wrote:
>  
> Hi all,
>  
> I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> - using ValidateRecord > Allow Extra Fields
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>  
> How can I make mandatory the Records array ? Is it possible ?
>  
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?
>  
>  
>  
> --OK - payload GOOD
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>  
> --NOK - payload BAD 1 - missing "Records" array à BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "RecordsXXX": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>  
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersionXX": "aaa"
>       }
>    ]
> }
>  
> Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
> <image001.png>
>  
>  
> Heres ValidateRecord processor + reader/writer controllers:
> Avro schema with just array “Records” and “eventVersion” as min tag on array element.
> Using Allow Extra Fields true:
> So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
> ==========================================================
>  
> --ValidateRecord 1.8.0
> Record Reader                           JsonTreeReader
> Record Writer                           JsonRecordSetWriter
> Record Writer for Invalid Records      
> Schema Access Strategy                  Use Reader's Schema
> Schema Registry                         No value set
> Schema Name                             ${schema.name <http://schema.name/>}
> Schema Text                             ${avro.schema}
> Allow Extra Fields                      true
> Strict Type Checking                    true
>  
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
> Schema Access Strategy                  Use 'Schema Text' Property
> Schema Registry                        
> Schema Name                             ${schema.name <http://schema.name/>}
> Schema Version                         
> Schema Branch                          
> Schema Text                            
>                                         {
>                                            "name": "MyName",
>                                            "type": "record",
>                                            "namespace": "aa.bb.cc <http://aa.bb.cc/>",
>                                            "fields": [{
>                                                  "name": "Records",
>                                                  "type": {
>                                                     "type": "array",
>                                                     "items": {
>                                                        "name": "Records_record",
>                                                        "type": "record",
>                                                        "fields": [{
>                                                              "name": "eventVersion",
>                                                              "type": "string"
>                                                           }
>                                                        ]
>                                                     }
>                                                  }
>                                               }
>                                            ]
>                                         }
> Date Format                            
> Time Format
> Timestamp Format
>  
> --JsonRecordSetWriter 1.8.0
> Schema Write Strategy                   Do Not Write Schema
> Schema Access Strategy                  Inherit Record Schema
> Schema Registry                        
> Schema Name                             ${schema.name <http://schema.name/>}
> Schema Version
> Schema Branch
> Schema Text                             { "name": "eventVersion", "type": "string" }
> Date Format
> Time Format
> Timestamp Format
> Pretty Print JSON                       true
> Suppress Null Values                    Never Suppress
> Output Grouping                         Array
>  
> Thanks in advance,
> Emanuel Oliveira
>  
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>


RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by "Oliveira, Emanuel" <Em...@fmr.com>.
Anyway knowledgably on avro schemas can please confirm/suggest if this inability to invalidate json payload missing array in root when allowing  extra field-true is normal ?

There’s 2 options with:

  *   ValidateRecord.Allow Extra Fields=false --> need to supply full schema
  *   ValidateRecord.Allow Extra Fields=true --> this is what I been testing/want, a way to supply schema with only mandatory fields.

I want 2 mandatory fields, an array with at least 1 element having eventVersion, so minimal json should be:
{ (..)
   "Records": [{
         "eventVersion": "aaa"
         (..)
      }
   ]
   (..)
}

Problem is ValidateRecord considers FF valid if missing “Records” array in the root!!!!
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
}

IF I supply the array “Records” then the schema correctly validates I need at least eventVersion on the array element record.


So… maybe my question can be tuned to “is it possible on avro schema syntax to specify cardinalities like in a db e/r diagram where a relation can be one of the following:
0..n
1..0
1 and only 1 ?


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Oliveira, Emanuel <Em...@fmr.com>
Sent: Friday 6 December 2019 10:15
To: users@nifi.apache.org
Subject: RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Hi Mark, forgot to share the NiFi version we using:
1.8.0
10/22/2018 23:48:30 EDT
Tagged nifi-1.8.0-RC3


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Emanuel Oliveira <em...@gmail.com>>
Sent: Thursday 5 December 2019 22:42
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.

Cheers,
Emanuel

On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com>> wrote:
Emanuel,

What version of NiFi are you using?

I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:

Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]




Thanks
-Mark


On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>> wrote:

Hi all,

I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
--objective - Mandatory to have "Records array" with at least "eventVersion"
- using ValidateRecord > Allow Extra Fields
- problem im facing is nifi dont trigger payload BAD 1 as invalid!!

How can I make mandatory the Records array ? Is it possible ?

I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?



--OK - payload GOOD
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersion": "aaa"
      }
   ]
}

--NOK - payload BAD 1 - missing "Records" array --> BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "RecordsXXX": [{
         "eventVersion": "aaa"
      }
   ]
}

--OK - payload BAD 2 - "Records" array present but missing "eventVersion"
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersionXX": "aaa"
      }
   ]
}

Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
<image001.png>


Heres ValidateRecord processor + reader/writer controllers:

  *   Avro schema with just array “Records” and “eventVersion” as min tag on array element.
  *   Using Allow Extra Fields true:

     *   So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
     *   FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
==========================================================

--ValidateRecord 1.8.0
Record Reader                           JsonTreeReader
Record Writer                           JsonRecordSetWriter
Record Writer for Invalid Records
Schema Access Strategy                  Use Reader's Schema
Schema Registry                         No value set
Schema Name                             ${schema.name<http://schema.name>}
Schema Text                             ${avro.schema}
Allow Extra Fields                      true
Strict Type Checking                    true

--JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
Schema Access Strategy                  Use 'Schema Text' Property
Schema Registry
Schema Name                             ${schema.name<http://schema.name>}
Schema Version
Schema Branch
Schema Text
                                        {
                                           "name": "MyName",
                                           "type": "record",
                                           "namespace": "aa.bb.cc<http://aa.bb.cc/>",
                                           "fields": [{
                                                 "name": "Records",
                                                 "type": {
                                                    "type": "array",
                                                    "items": {
                                                       "name": "Records_record",
                                                       "type": "record",
                                                       "fields": [{
                                                             "name": "eventVersion",
                                                             "type": "string"
                                                          }
                                                       ]
                                                    }
                                                 }
                                              }
                                           ]
                                        }
Date Format
Time Format
Timestamp Format

--JsonRecordSetWriter 1.8.0
Schema Write Strategy                   Do Not Write Schema
Schema Access Strategy                  Inherit Record Schema
Schema Registry
Schema Name                             ${schema.name<http://schema.name>}
Schema Version
Schema Branch
Schema Text                             { "name": "eventVersion", "type": "string" }
Date Format
Time Format
Timestamp Format
Pretty Print JSON                       true
Suppress Null Values                    Never Suppress
Output Grouping                         Array

Thanks in advance,
Emanuel Oliveira

<ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>


RE: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by "Oliveira, Emanuel" <Em...@fmr.com>.
Hi Mark, forgot to share the NiFi version we using:
1.8.0
10/22/2018 23:48:30 EDT
Tagged nifi-1.8.0-RC3


Thanks//Regards,
Emanuel Oliveira
Senior Oracle/Data Engineer | CTG | Galway
TEL ext: 353 – (0)91-74  4971 | int: 8-737 4971 |  who's who<http://fidelitycentral.fmr.com/ww/a639704> 

From: Emanuel Oliveira <em...@gmail.com>
Sent: Thursday 5 December 2019 22:42
To: users@nifi.apache.org
Subject: Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

This email is from an external source - exercise caution regarding links and attachments.

Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into GenerateFlowfile as this is the problem.

Cheers,
Emanuel

On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com>> wrote:
Emanuel,

What version of NiFi are you using?

I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:

Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]




Thanks
-Mark



On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>> wrote:

Hi all,

I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
--objective - Mandatory to have "Records array" with at least "eventVersion"
- using ValidateRecord > Allow Extra Fields
- problem im facing is nifi dont trigger payload BAD 1 as invalid!!

How can I make mandatory the Records array ? Is it possible ?

I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?



--OK - payload GOOD
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersion": "aaa"
      }
   ]
}

--NOK - payload BAD 1 - missing "Records" array --> BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "RecordsXXX": [{
         "eventVersion": "aaa"
      }
   ]
}

--OK - payload BAD 2 - "Records" array present but missing "eventVersion"
{
   "Service": "sssssss",
   "Event": "eeeee",
   "Time": "2019-11-25T16:21:53.280Z",
   "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
   "RequestId": "RRRRRRRRRRRRRRRRRR",
   "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
   "Records": [{
         "eventVersionXX": "aaa"
      }
   ]
}

Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
<image001.png>


Heres ValidateRecord processor + reader/writer controllers:

  *   Avro schema with just array “Records” and “eventVersion” as min tag on array element.
  *   Using Allow Extra Fields true:

     *   So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
     *   FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
==========================================================

--ValidateRecord 1.8.0
Record Reader                           JsonTreeReader
Record Writer                           JsonRecordSetWriter
Record Writer for Invalid Records
Schema Access Strategy                  Use Reader's Schema
Schema Registry                         No value set
Schema Name                             ${schema.name<http://schema.name>}
Schema Text                             ${avro.schema}
Allow Extra Fields                      true
Strict Type Checking                    true

--JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
Schema Access Strategy                  Use 'Schema Text' Property
Schema Registry
Schema Name                             ${schema.name<http://schema.name>}
Schema Version
Schema Branch
Schema Text
                                        {
                                           "name": "MyName",
                                           "type": "record",
                                           "namespace": "aa.bb.cc<http://aa.bb.cc/>",
                                           "fields": [{
                                                 "name": "Records",
                                                 "type": {
                                                    "type": "array",
                                                    "items": {
                                                       "name": "Records_record",
                                                       "type": "record",
                                                       "fields": [{
                                                             "name": "eventVersion",
                                                             "type": "string"
                                                          }
                                                       ]
                                                    }
                                                 }
                                              }
                                           ]
                                        }
Date Format
Time Format
Timestamp Format

--JsonRecordSetWriter 1.8.0
Schema Write Strategy                   Do Not Write Schema
Schema Access Strategy                  Inherit Record Schema
Schema Registry
Schema Name                             ${schema.name<http://schema.name>}
Schema Version
Schema Branch
Schema Text                             { "name": "eventVersion", "type": "string" }
Date Format
Time Format
Timestamp Format
Pretty Print JSON                       true
Suppress Null Values                    Never Suppress
Output Grouping                         Array

Thanks in advance,
Emanuel Oliveira

<ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>


Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Emanuel Oliveira <em...@gmail.com>.
Hi Mark, be sure you copy paste "NOK - payload BAD 1 - " into
GenerateFlowfile as this is the problem.

Cheers,
Emanuel

On Thu 5 Dec 2019, 22:03 Mark Payne, <ma...@hotmail.com> wrote:

> Emanuel,
>
> What version of NiFi are you using?
>
> I just tested the attached template against the latest, and the FlowFile
> was routed to 'invalid' with the explanation:
>
> Records in this FlowFile were invalid for the following reasons: The
> following 1 fields were missing: [[0]/Records/eventVersion]
>
>
>
>
> Thanks
> -Mark
>
>
> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com>
> wrote:
>
> Hi all,
>
> I been struggling to find a way for ValidateRecord using Avro Schema to
> force mandatory the presence of an array on json payload, problem is if
> array “records” is missing Validate is considering FF valid ☹.
> --objective - Mandatory to have "Records array" with at least
> "eventVersion"
> - using ValidateRecord > Allow Extra Fields
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>
> How can I make mandatory the Records array ? Is it possible ?
>
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to
> rid off the ARRAY, and also to fial if array "Records" not present.. But I
> would like to have a clean solution using just avro schema, is this
> possible ?
>
>
>
> --OK - payload GOOD
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>
> --NOK - payload BAD 1 - missing "Records" array à BUT
> VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent
> “invalid” since is not compliant to my avro schema which needs array
> “Records” with element “eventVersion” as 2 mandatory things.
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "RecordsXXX": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersionXX": "aaa"
>       }
>    ]
> }
>
> Its very simple test flow (attachmed the xml template
> ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using
> ValidateRecord with JsonReader/Json Writer:
> <image001.png>
>
>
> Heres ValidateRecord processor + reader/writer controllers:
>
>    - Avro schema with just array “Records” and “eventVersion” as min tag
>    on array element.
>    - Using Allow Extra Fields true:
>       - So im ok having other fields on the root side by side with the
>       array “Records”, and also ok to have extra elements inside each array.
>       - FYI: the real use case im trying to validate AWS SQS message (s3
>       trigger) where I will be interested on several fields, but crafted this
>       simpler example just to ask if its possible to force array to be mandatory
>       and with at least 1 element ?
>
> ==========================================================
>
> --ValidateRecord 1.8.0
> Record Reader                           JsonTreeReader
> Record Writer                           JsonRecordSetWriter
> Record Writer for Invalid Records
> Schema Access Strategy                  Use Reader's Schema
> Schema Registry                         No value set
> Schema Name                             ${schema.name}
> Schema Text                             ${avro.schema}
> Allow Extra Fields                      true
> Strict Type Checking                    true
>
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY +
> "eventVersion" on each ARRAY element
> Schema Access Strategy                  Use 'Schema Text' Property
> Schema Registry
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text
>                                         {
>                                            "name": "MyName",
>                                            "type": "record",
>                                            "namespace": "aa.bb.cc",
>                                            "fields": [{
>                                                  "name": "Records",
>                                                  "type": {
>                                                     "type": "array",
>                                                     "items": {
>                                                        "name":
> "Records_record",
>                                                        "type": "record",
>                                                        "fields": [{
>                                                              "name":
> "eventVersion",
>                                                              "type":
> "string"
>                                                           }
>                                                        ]
>                                                     }
>                                                  }
>                                               }
>                                            ]
>                                         }
> Date Format
> Time Format
> Timestamp Format
>
> --JsonRecordSetWriter 1.8.0
> Schema Write Strategy                   Do Not Write Schema
> Schema Access Strategy                  Inherit Record Schema
> Schema Registry
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text                             { "name": "eventVersion", "type":
> "string" }
> Date Format
> Time Format
> Timestamp Format
> Pretty Print JSON                       true
> Suppress Null Values                    Never Suppress
> Output Grouping                         Array
>
> Thanks in advance,
> Emanuel Oliveira
>
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>
>
>
>

Re: NiFi ValidateRecord - unable to handle missing mandatory ARRAY ?

Posted by Mark Payne <ma...@hotmail.com>.
Emanuel,

What version of NiFi are you using?

I just tested the attached template against the latest, and the FlowFile was routed to 'invalid' with the explanation:

Records in this FlowFile were invalid for the following reasons: The following 1 fields were missing: [[0]/Records/eventVersion]




Thanks
-Mark


> On Dec 5, 2019, at 7:06 AM, Oliveira, Emanuel <Em...@fmr.com> wrote:
> 
> Hi all,
>  
> I been struggling to find a way for ValidateRecord using Avro Schema to force mandatory the presence of an array on json payload, problem is if array “records” is missing Validate is considering FF valid ☹.
> --objective - Mandatory to have "Records array" with at least "eventVersion"
> - using ValidateRecord > Allow Extra Fields
> - problem im facing is nifi dont trigger payload BAD 1 as invalid!!
>  
> How can I make mandatory the Records array ? Is it possible ?
>  
> I know I can eventually use a SplitJson JsonPath Expression=$.Records to rid off the ARRAY, and also to fial if array "Records" not present.. But I would like to have a clean solution using just avro schema, is this possible ?
>  
>  
>  
> --OK - payload GOOD
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>  
> --NOK - payload BAD 1 - missing "Records" array à BUT VALIDATERECORD/AVROSCHEMA SENDS FF TO “valid”!! I want it to be sent “invalid” since is not compliant to my avro schema which needs array “Records” with element “eventVersion” as 2 mandatory things.
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "RecordsXXX": [{
>          "eventVersion": "aaa"
>       }
>    ]
> }
>  
> --OK - payload BAD 2 - "Records" array present but missing "eventVersion"
> {
>    "Service": "sssssss",
>    "Event": "eeeee",
>    "Time": "2019-11-25T16:21:53.280Z",
>    "Bucket": "bbb-bbbbb-bbb-bbbbb-bbbbbb",
>    "RequestId": "RRRRRRRRRRRRRRRRRR",
>    "HostId": "hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh",
>    "Records": [{
>          "eventVersionXX": "aaa"
>       }
>    ]
> }
>  
> Its very simple test flow (attachmed the xml template ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml) just using ValidateRecord with JsonReader/Json Writer:
> <image001.png>
>  
>  
> Heres ValidateRecord processor + reader/writer controllers:
> Avro schema with just array “Records” and “eventVersion” as min tag on array element.
> Using Allow Extra Fields true:
> So im ok having other fields on the root side by side with the array “Records”, and also ok to have extra elements inside each array.
> FYI: the real use case im trying to validate AWS SQS message (s3 trigger) where I will be interested on several fields, but crafted this simpler example just to ask if its possible to force array to be mandatory and with at least 1 element ?
> ==========================================================
>  
> --ValidateRecord 1.8.0
> Record Reader                           JsonTreeReader
> Record Writer                           JsonRecordSetWriter
> Record Writer for Invalid Records      
> Schema Access Strategy                  Use Reader's Schema
> Schema Registry                         No value set
> Schema Name                             ${schema.name}
> Schema Text                             ${avro.schema}
> Allow Extra Fields                      true
> Strict Type Checking                    true
>  
> --JsonTreeReader 1.8.0 - MANDATORY TO HAVE "Records" ARRAY + "eventVersion" on each ARRAY element
> Schema Access Strategy                  Use 'Schema Text' Property
> Schema Registry                        
> Schema Name                             ${schema.name}
> Schema Version                         
> Schema Branch                          
> Schema Text                            
>                                         {
>                                            "name": "MyName",
>                                            "type": "record",
>                                            "namespace": "aa.bb.cc <http://aa.bb.cc/>",
>                                            "fields": [{
>                                                  "name": "Records",
>                                                  "type": {
>                                                     "type": "array",
>                                                     "items": {
>                                                        "name": "Records_record",
>                                                        "type": "record",
>                                                        "fields": [{
>                                                              "name": "eventVersion",
>                                                              "type": "string"
>                                                           }
>                                                        ]
>                                                     }
>                                                  }
>                                               }
>                                            ]
>                                         }
> Date Format                            
> Time Format
> Timestamp Format
>  
> --JsonRecordSetWriter 1.8.0
> Schema Write Strategy                   Do Not Write Schema
> Schema Access Strategy                  Inherit Record Schema
> Schema Registry                        
> Schema Name                             ${schema.name}
> Schema Version
> Schema Branch
> Schema Text                             { "name": "eventVersion", "type": "string" }
> Date Format
> Time Format
> Timestamp Format
> Pretty Print JSON                       true
> Suppress Null Values                    Never Suppress
> Output Grouping                         Array
>  
> Thanks in advance,
> Emanuel Oliveira
>  
> <ValidateRecord_missing_mandatory_ARRAY_is_VALID_problem.xml>