You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by "Greene (US), Geoffrey N" <ge...@boeing.com> on 2021/03/22 16:39:48 UTC

RecordPath...With INNER records

Im making good progress leaning to use using LookupRecord  Processor.
I have it working so that I know how to turn
[{"key": "value1"}, {"key":"value2"}, {"key":"value3"}]
Into
[{"key": "value1","enhanced":1}, {"key":"value2","enhanced":2}, {"key":"value3","enhanced":3}]
BUT
What if you have  inner records? Is that supported?  How?

What if you have
{"foo":"bar", "outerrecord":[{"key": "value1"}, {"key":"value2"}, {"key":"value3"}]}
And want to turn it into
{"foo":"bar", "outerrecord":[{"key": "value1","enhanced":1}, {"key":"value2","enhanced":2}, {"key":"value3","enhanced":3}]}
How do you specify your lookup key is outerrecord/key, but the array of records is actually located at /outerrecord.  Is there some kind of path magic? Or is there some kind of flow magic where you have to separate outerrecord into a separate flow file, and then (?somehow?) glue it back in after its processed?

Thanks!

RE: RecordPath...With INNER records

Posted by "Greene (US), Geoffrey N" <ge...@boeing.com>.

Jira Ticket filed #8366

From: Mark Payne [mailto:markap14@hotmail.com]
Sent: Thursday, March 25, 2021 9:16 AM
To: users@nifi.apache.org
Subject: [EXTERNAL] Re: RecordPath...With INNER records

EXT email: be mindful of links/attachments.

Geoffrey,

As Jorge mentioned, you can reference specific array indices in the RecordPath like /outer[0]/key, or you could reference all elements such as /outer[*]/key. But at this time, I don’t think LookupRecord will allow for multiple lookups per record. So the enrichment you’re looking at would not work the way you want it, I don’t believe.

You could work around this by using ForkRecord, and then re-merging. ForkRecord will allow you to reference an array via RecordPath and split up the incoming Record into multiple Records, one for each value in the array. You could then use LookupRecord. But if you wanted to merge them back together afterward, that is doable but not as straight-forward as you’d probably hope. It can be done using JoltTransformJSON (JoltTransformRecord operates per-records, where as JoltTransformJSON operates on the entire FlowFile payload as a single object). But JoltTransformJSON would load the entire contents of the FlowFile into memory, which can be expensive. You could also use a script via ExecuteScript, etc. to merge the data, or a custom processor.

All that being said, I do think that would make for a very good improvement to the LookuupRecord processor - to allow for multiple hits per record by referencing arrays / map elements. Feel free to file a Jira for that improvement.

Thanks
-Mark

On Mar 25, 2021, at 3:05 AM, Jorge Machado <jo...@me.com>> wrote:

Hey Greene,

The LookupRecord as a RecordPath as input. Check out this docs: https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#structure or https://www.nifi.rocks/record-path-cheat-sheet/

In your case you could enter /outerrecord/[0]/key. Something like that, I did not test it but you get the idea.
Regards
Jorge from Datamesh
www.dmesh.io<http://www.dmesh.io/>

On 24. Mar 2021, at 21:59, Greene (US), Geoffrey N <ge...@boeing.com>> wrote:

Sorry for the resend. Thought I’d try one more time.  I’m struggling with LookupRecord.

I’m making good progress leaning to use using Lookup Record  Processor.
I have it working so that I know how to turn
[{“key”: “value1”}, {“key”:”value2”}, {“key”:”value3”}]
Into
[{“key”: “value1”,”enhanced”:1}, {“key”:”value2”,”enhanced”:2}, {“key”:”value3”,”enhanced”:3}]
BUT
What if your array is actually inside an outer record? Is that supported?  How?

What if you have
{“foo”:”bar”, “outerrecord”:[{“key”: “value1”}, {“key”:”value2”}, {“key”:”value3”}]}
And want to turn it into
{“foo”:”bar”, “outerrecord”:[{“key”: “value1”,”enhanced”:1}, {“key”:”value2”,”enhanced”:2}, {“key”:”value3”,”enhanced”:3}]}
How do you specify your lookup key is outerrecord/key, but the array of records is actually located at /outerrecord.  Is there some kind of path magic? Or is there some kind of flow magic where you have to separate outerrecord into a separate flow file, and then (?somehow?) glue it back in after its processed?

(If necessary, I could certainly put the whole thing into an array too, of course, if that helps.)

Thanks!

Re: RecordPath...With INNER records

Posted by Mark Payne <ma...@hotmail.com>.

Geoffrey,

As Jorge mentioned, you can reference specific array indices in the RecordPath like /outer[0]/key, or you could reference all elements such as /outer[*]/key. But at this time, I don’t think LookupRecord will allow for multiple lookups per record. So the enrichment you’re looking at would not work the way you want it, I don’t believe.

You could work around this by using ForkRecord, and then re-merging. ForkRecord will allow you to reference an array via RecordPath and split up the incoming Record into multiple Records, one for each value in the array. You could then use LookupRecord. But if you wanted to merge them back together afterward, that is doable but not as straight-forward as you’d probably hope. It can be done using JoltTransformJSON (JoltTransformRecord operates per-records, where as JoltTransformJSON operates on the entire FlowFile payload as a single object). But JoltTransformJSON would load the entire contents of the FlowFile into memory, which can be expensive. You could also use a script via ExecuteScript, etc. to merge the data, or a custom processor.

All that being said, I do think that would make for a very good improvement to the LookuupRecord processor - to allow for multiple hits per record by referencing arrays / map elements. Feel free to file a Jira for that improvement.

Thanks
-Mark


On Mar 25, 2021, at 3:05 AM, Jorge Machado <jo...@me.com>> wrote:

Hey Greene,

The LookupRecord as a RecordPath as input. Check out this docs: https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#structure or https://www.nifi.rocks/record-path-cheat-sheet/

In your case you could enter /outerrecord/[0]/key. Something like that, I did not test it but you get the idea.
Regards
Jorge from Datamesh
www.dmesh.io<http://www.dmesh.io/>


On 24. Mar 2021, at 21:59, Greene (US), Geoffrey N <ge...@boeing.com>> wrote:

Sorry for the resend. Thought I’d try one more time.  I’m struggling with LookupRecord.

I’m making good progress leaning to use using Lookup Record  Processor.
I have it working so that I know how to turn
[{“key”: “value1”}, {“key”:”value2”}, {“key”:”value3”}]
Into
[{“key”: “value1”,”enhanced”:1}, {“key”:”value2”,”enhanced”:2}, {“key”:”value3”,”enhanced”:3}]
BUT
What if your array is actually inside an outer record? Is that supported?  How?

What if you have
{“foo”:”bar”, “outerrecord”:[{“key”: “value1”}, {“key”:”value2”}, {“key”:”value3”}]}
And want to turn it into
{“foo”:”bar”, “outerrecord”:[{“key”: “value1”,”enhanced”:1}, {“key”:”value2”,”enhanced”:2}, {“key”:”value3”,”enhanced”:3}]}
How do you specify your lookup key is outerrecord/key, but the array of records is actually located at /outerrecord.  Is there some kind of path magic? Or is there some kind of flow magic where you have to separate outerrecord into a separate flow file, and then (?somehow?) glue it back in after its processed?

(If necessary, I could certainly put the whole thing into an array too, of course, if that helps.)

Thanks!

Re: RecordPath...With INNER records

Posted by Jorge Machado <jo...@me.com>.

Hey Greene, 

The LookupRecord as a RecordPath as input. Check out this docs: https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#structure <https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#structure> or https://www.nifi.rocks/record-path-cheat-sheet/ <https://www.nifi.rocks/record-path-cheat-sheet/>

In your case you could enter /outerrecord/[0]/key. Something like that, I did not test it but you get the idea. 
Regards
Jorge from Datamesh 
www.dmesh.io


> On 24. Mar 2021, at 21:59, Greene (US), Geoffrey N <ge...@boeing.com> wrote:
> 
> Sorry for the resend. Thought I’d try one more time.  I’m struggling with LookupRecord.
>  
> I’m making good progress leaning to use using Lookup Record  Processor.
> I have it working so that I know how to turn
> [{“key”: “value1”}, {“key”:”value2”}, {“key”:”value3”}]
> Into
> [{“key”: “value1”,”enhanced”:1}, {“key”:”value2”,”enhanced”:2}, {“key”:”value3”,”enhanced”:3}]
> BUT
> What if your array is actually inside an outer record? Is that supported?  How?
>  
> What if you have 
> {“foo”:”bar”, “outerrecord”:[{“key”: “value1”}, {“key”:”value2”}, {“key”:”value3”}]}
> And want to turn it into
> {“foo”:”bar”, “outerrecord”:[{“key”: “value1”,”enhanced”:1}, {“key”:”value2”,”enhanced”:2}, {“key”:”value3”,”enhanced”:3}]}
> How do you specify your lookup key is outerrecord/key, but the array of records is actually located at /outerrecord.  Is there some kind of path magic? Or is there some kind of flow magic where you have to separate outerrecord into a separate flow file, and then (?somehow?) glue it back in after its processed?
>  
> (If necessary, I could certainly put the whole thing into an array too, of course, if that helps.)
>  
> Thanks!

RecordPath...With INNER records

Posted by "Greene (US), Geoffrey N" <ge...@boeing.com>.

Sorry for the resend. Thought I'd try one more time.  I'm struggling with LookupRecord.

I'm making good progress leaning to use using Lookup Record  Processor.
I have it working so that I know how to turn
[{"key": "value1"}, {"key":"value2"}, {"key":"value3"}]
Into
[{"key": "value1","enhanced":1}, {"key":"value2","enhanced":2}, {"key":"value3","enhanced":3}]
BUT
What if your array is actually inside an outer record? Is that supported?  How?

What if you have
{"foo":"bar", "outerrecord":[{"key": "value1"}, {"key":"value2"}, {"key":"value3"}]}
And want to turn it into
{"foo":"bar", "outerrecord":[{"key": "value1","enhanced":1}, {"key":"value2","enhanced":2}, {"key":"value3","enhanced":3}]}
How do you specify your lookup key is outerrecord/key, but the array of records is actually located at /outerrecord.  Is there some kind of path magic? Or is there some kind of flow magic where you have to separate outerrecord into a separate flow file, and then (?somehow?) glue it back in after its processed?

(If necessary, I could certainly put the whole thing into an array too, of course, if that helps.)

Thanks!