You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "stephen.hindmarch.bt.com via users" <us...@nifi.apache.org> on 2022/07/15 11:12:39 UTC

Update records with literal null, true or false values.

Hi all,

I have been looking at a case where some records have all fields presented as strings, and I need to turn the numeric or boolean values into their native types. I can do most of this with Jolt, but in the case where the value is missing I have a problem.

Say I have these records.

[
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"alice","id":"12345671","has_cover":"true"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"bob","id":"12345672","has_cover":"false"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

I can use "modify-overwrite" to turn the coordinates into doubles, the Booleans into true/false, and the user ID into a numeric. But this fails for Chuck's record as Jolt ignores the empty string or none-truthy strings. The result I get is like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

But what I really want, in order to conform to my Avro schema, is more like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":null,"longitude":null,"user":{"name":"chuck","id":null,"has_cover":false},"vehicle":{"id":""}}
]

I looked at UpdateRecord and EvaluteJSONPath, but I cannot see a way to return a literal null, true or false. I have resorted to using some ReplaceTexts which can find and replace some of the errant values, but struggles with distinguishing between the user ID, which has to be numeric, and the vehicle ID, which needs to stay as a string. And global find and replace on text seems like a coarse instrument when the content is already neatly in records.

Can anyone suggest a better solution?

Thanks.

Steve Hindmarch

RE: Update records with literal null, true or false values.

Posted by "stephen.hindmarch.bt.com via users" <us...@nifi.apache.org>.
Thank you Maarten,

That worked perfectly, both with the empty string and with a non-boolean string. I was not aware of that use of the array on the RHS to produce alternative values.

Steve Hindmarch

From: Maarten Smeets <Ma...@AMIS.nl>
Sent: 16 July 2022 10:32
To: users@nifi.apache.org; Hindmarch,SJ,Stephen,VIR R <st...@bt.com>
Subject: RE: Update records with literal null, true or false values.

You don't often get email from maarten.smeets@amis.nl<ma...@amis.nl>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Hi Stephen,

In my case the empty values were filled with a string like "EMPTY" so I could do something like the following;

{
    "operation": "modify-overwrite-beta",
    "spec": {
      "data": {
        values": {
          "*": {
            "value": ["=toDouble", null]
          }
        }
      }
    }
  }

EMPTY cannot be cast to double so it gets filled with null.

If you have an empty string instead of something like "EMPTY" and the above Jolt transformation does not appear to work, you can use SplitJson, EvaluateJsonPath, save the result in an attribute, UpdateAttribute in which you can check whether the size of the attribute is 0 (something like ${att_name:length():gt(0)}) and if that is the case, replace it with a string like "EMPTY" using ifElse or you can try replaceEmpty. Next you can use the JoltTransform on the record to get the string "EMPTY" in there by using ${att_name} in the Jolt transformation. After that you can use modify-overwrite-beta to replace it with null like shown above. There might be an easier solution though.

Hope this helps.

With kind regards,
Maarten Smeets

From: stephen.hindmarch.bt.com via users <us...@nifi.apache.org>>
Sent: Friday, 15 July 2022 13:13
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Update records with literal null, true or false values.

Hi all,

I have been looking at a case where some records have all fields presented as strings, and I need to turn the numeric or boolean values into their native types. I can do most of this with Jolt, but in the case where the value is missing I have a problem.

Say I have these records.

[
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"alice","id":"12345671","has_cover":"true"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"bob","id":"12345672","has_cover":"false"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

I can use "modify-overwrite" to turn the coordinates into doubles, the Booleans into true/false, and the user ID into a numeric. But this fails for Chuck's record as Jolt ignores the empty string or none-truthy strings. The result I get is like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

But what I really want, in order to conform to my Avro schema, is more like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":null,"longitude":null,"user":{"name":"chuck","id":null,"has_cover":false},"vehicle":{"id":""}}
]

I looked at UpdateRecord and EvaluteJSONPath, but I cannot see a way to return a literal null, true or false. I have resorted to using some ReplaceTexts which can find and replace some of the errant values, but struggles with distinguishing between the user ID, which has to be numeric, and the vehicle ID, which needs to stay as a string. And global find and replace on text seems like a coarse instrument when the content is already neatly in records.

Can anyone suggest a better solution?

Thanks.

Steve Hindmarch

RE: Update records with literal null, true or false values.

Posted by Maarten Smeets <Ma...@AMIS.nl>.
Hi Stephen,

In my case the empty values were filled with a string like “EMPTY” so I could do something like the following;

{
    "operation": "modify-overwrite-beta",
    "spec": {
      "data": {
        values": {
          "*": {
            "value": ["=toDouble", null]
          }
        }
      }
    }
  }

EMPTY cannot be cast to double so it gets filled with null.

If you have an empty string instead of something like “EMPTY” and the above Jolt transformation does not appear to work, you can use SplitJson, EvaluateJsonPath, save the result in an attribute, UpdateAttribute in which you can check whether the size of the attribute is 0 (something like ${att_name:length():gt(0)}) and if that is the case, replace it with a string like “EMPTY” using ifElse or you can try replaceEmpty. Next you can use the JoltTransform on the record to get the string “EMPTY” in there by using ${att_name} in the Jolt transformation. After that you can use modify-overwrite-beta to replace it with null like shown above. There might be an easier solution though.

Hope this helps.

With kind regards,
Maarten Smeets

From: stephen.hindmarch.bt.com via users <us...@nifi.apache.org>
Sent: Friday, 15 July 2022 13:13
To: users@nifi.apache.org
Subject: Update records with literal null, true or false values.

Hi all,

I have been looking at a case where some records have all fields presented as strings, and I need to turn the numeric or boolean values into their native types. I can do most of this with Jolt, but in the case where the value is missing I have a problem.

Say I have these records.

[
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"alice","id":"12345671","has_cover":"true"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"bob","id":"12345672","has_cover":"false"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

I can use “modify-overwrite” to turn the coordinates into doubles, the Booleans into true/false, and the user ID into a numeric. But this fails for Chuck’s record as Jolt ignores the empty string or none-truthy strings. The result I get is like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

But what I really want, in order to conform to my Avro schema, is more like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":null,"longitude":null,"user":{"name":"chuck","id":null,"has_cover":false},"vehicle":{"id":""}}
]

I looked at UpdateRecord and EvaluteJSONPath, but I cannot see a way to return a literal null, true or false. I have resorted to using some ReplaceTexts which can find and replace some of the errant values, but struggles with distinguishing between the user ID, which has to be numeric, and the vehicle ID, which needs to stay as a string. And global find and replace on text seems like a coarse instrument when the content is already neatly in records.

Can anyone suggest a better solution?

Thanks.

Steve Hindmarch

RE: Update records with literal null, true or false values.

Posted by "stephen.hindmarch.bt.com via users" <us...@nifi.apache.org>.
Thank you Josef,

That works for the empty string values turning to null in the numeric case. I can probably change my use case to allow null as a replacement for false in the Booleans case.

Steve Hindmarch


From: Josef.Zahner1@swisscom.com <Jo...@swisscom.com>
Sent: 18 July 2022 06:44
To: users@nifi.apache.org; Hindmarch,SJ,Stephen,VIR R <st...@bt.com>
Subject: Re: Update records with literal null, true or false values.

Hi Stephen

Not sure whether I understood you correct and if I really remember it correct, but the UpdateRecord processor below should produce a NULL value if the string is “” (empty). We don’t use it anymore, so I’m not 100% whether it still works or does what you want. And yes it’s not straight forward ;-).

[Graphical user interface, text, application, email  Description automatically generated]


Cheers Josef


From: "stephen.hindmarch.bt.com via users" <us...@nifi.apache.org>>
Reply to: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>, "stephen.hindmarch@bt.com<ma...@bt.com>" <st...@bt.com>>
Date: Friday, 15 July 2022 at 13:13
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Update records with literal null, true or false values.

Hi all,

I have been looking at a case where some records have all fields presented as strings, and I need to turn the numeric or boolean values into their native types. I can do most of this with Jolt, but in the case where the value is missing I have a problem.

Say I have these records.

[
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"alice","id":"12345671","has_cover":"true"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"bob","id":"12345672","has_cover":"false"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

I can use “modify-overwrite” to turn the coordinates into doubles, the Booleans into true/false, and the user ID into a numeric. But this fails for Chuck’s record as Jolt ignores the empty string or none-truthy strings. The result I get is like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

But what I really want, in order to conform to my Avro schema, is more like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":null,"longitude":null,"user":{"name":"chuck","id":null,"has_cover":false},"vehicle":{"id":""}}
]

I looked at UpdateRecord and EvaluteJSONPath, but I cannot see a way to return a literal null, true or false. I have resorted to using some ReplaceTexts which can find and replace some of the errant values, but struggles with distinguishing between the user ID, which has to be numeric, and the vehicle ID, which needs to stay as a string. And global find and replace on text seems like a coarse instrument when the content is already neatly in records.

Can anyone suggest a better solution?

Thanks.

Steve Hindmarch

Re: Update records with literal null, true or false values.

Posted by Jo...@swisscom.com.
Hi Stephen

Not sure whether I understood you correct and if I really remember it correct, but the UpdateRecord processor below should produce a NULL value if the string is “” (empty). We don’t use it anymore, so I’m not 100% whether it still works or does what you want. And yes it’s not straight forward ;-).

[Graphical user interface, text, application, email  Description automatically generated]


Cheers Josef


From: "stephen.hindmarch.bt.com via users" <us...@nifi.apache.org>
Reply to: "users@nifi.apache.org" <us...@nifi.apache.org>, "stephen.hindmarch@bt.com" <st...@bt.com>
Date: Friday, 15 July 2022 at 13:13
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Update records with literal null, true or false values.

Hi all,

I have been looking at a case where some records have all fields presented as strings, and I need to turn the numeric or boolean values into their native types. I can do most of this with Jolt, but in the case where the value is missing I have a problem.

Say I have these records.

[
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"alice","id":"12345671","has_cover":"true"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"1.0","longitude":"-1.0","user":{"name":"bob","id":"12345672","has_cover":"false"},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

I can use “modify-overwrite” to turn the coordinates into doubles, the Booleans into true/false, and the user ID into a numeric. But this fails for Chuck’s record as Jolt ignores the empty string or none-truthy strings. The result I get is like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":"","longitude":"","user":{"name":"chuck","id":"","has_cover":"flargh"},"vehicle":{"id":""}}
]

But what I really want, in order to conform to my Avro schema, is more like this.

[
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"alice","id":12345671,"has_cover":true},"vehicle":{"id":"AB123DE"}},
  {"latitude":1.0,"longitude":-1.0,"user":{"name":"bob","id":12345672,"has_cover":false},"vehicle":{"id":"AB123DE"}},
  {"latitude":null,"longitude":null,"user":{"name":"chuck","id":null,"has_cover":false},"vehicle":{"id":""}}
]

I looked at UpdateRecord and EvaluteJSONPath, but I cannot see a way to return a literal null, true or false. I have resorted to using some ReplaceTexts which can find and replace some of the errant values, but struggles with distinguishing between the user ID, which has to be numeric, and the vehicle ID, which needs to stay as a string. And global find and replace on text seems like a coarse instrument when the content is already neatly in records.

Can anyone suggest a better solution?

Thanks.

Steve Hindmarch