You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by st...@bt.com on 2022/04/11 13:08:04 UTC

Unexpected Behaviour In LookupRecord With "Route To success" Strategy

Hi all,

I am trying to set up a simple enrichment pipeline where flow records get enriched from a Redis Distributed Map cache and I use a sequence of LookupRecord processors to gather the enrichment data. I am using the "Route to success" routing strategy because I would like to avoid fragmenting my flow files. However, the results are not what I expected and if the first record does not match an enrichment record then no records get enriched.

Here is a simple test case I have created.

1: Create a lookup record processor with the following parameters
Result RecordPath = /mood
Routing Strategy = Route To Success
key = concat('mood/',name)

2: Add these keys to my Redis index.
set mood/fred happy
set mood/charlie sad

3: Send in this flow file
[{"name":"fred"},{"name":"bill"},{"name":"charlie"}]

4: View the result
[{"name":"fred","mood":"happy"},{"name":"bill","mood":null},{"name":"charlie","mood":"sad"}]

That looks OK, every lookup has happened, and I can see that Bill was not matched as the enriched value is null.

5: Now try a different flow file, with Bill first.
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

Result
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

So because the first record did not match, no matches are made, and it looks as if the processing never happened.

6: Change the routing strategy to "Route to matched/unmatched". The result is
Matched => [{"name":"fred","mood":"happy"},{"name":"charlie","mood":"sad"}]
Unmatched => [{"name":"bill"}]

So I have achieved all of my lookups, but the cost is I have fragmented my flow file. After 4 lookups my original flow file (which in production will have a. 1000 records) will have been fragmented into 16 separate files, with a consequent impact on performance. Also the indication that the unmatched record was not matched is lost, which may be a feature I would like to use.

So my question is, does this look like expected behaviour or is this an issue?

Thanks
Steve Hindmarch,
BT's Global Division
This email contains information from BT, that might be privileged or confidential. And it's only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.
We monitor our email systems and may record all our emails.

British Telecommunications plc., 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000

Re: Unexpected Behaviour In LookupRecord With "Route To success" Strategy

Posted by Mark Payne <ma...@hotmail.com>.

Yeah, I just created one [1].

Thanks
-Mark


[1] https://issues.apache.org/jira/browse/NIFI-9903

On Apr 11, 2022, at 10:16 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

Thanks Mark,

Is there a JIRA open for this?

Regards
Steve


From: Mark Payne <ma...@hotmail.com>>
Sent: 11 April 2022 14:34
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: Re: Unexpected Behaviour In LookupRecord With "Route To success" Strategy

Steve,

Thanks for the note. Ironically, I ran into this issue just yesterday. Unfortunately, it’s a bug that will have to be addressed.

In the meantime, if you define the schema for your Record Writer explicitly, it should work as expected. The issue comes down to the fact that the first record is enriched. And then the schema is determined from the enriched data. Then the rest are enriched. But if the first one doesn’t have any enrichment data added, the result is that the schema is determined for the flowfile without any enrichment. So while the records do get enriched, the schema that is associated with the FlowFile is missing those fields. So explicitly defining the schema should work.

Thanks
-Mark



On Apr 11, 2022, at 9:08 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

Hi all,

I am trying to set up a simple enrichment pipeline where flow records get enriched from a Redis Distributed Map cache and I use a sequence of LookupRecord processors to gather the enrichment data. I am using the “Route to success” routing strategy because I would like to avoid fragmenting my flow files. However, the results are not what I expected and if the first record does not match an enrichment record then no records get enriched.

Here is a simple test case I have created.

1: Create a lookup record processor with the following parameters
Result RecordPath = /mood
Routing Strategy = Route To Success
key = concat('mood/',name)

2: Add these keys to my Redis index.
set mood/fred happy
set mood/charlie sad

3: Send in this flow file
[{"name":"fred"},{"name":"bill"},{"name":"charlie"}]

4: View the result
[{"name":"fred","mood":"happy"},{"name":"bill","mood":null},{"name":"charlie","mood":"sad"}]

That looks OK, every lookup has happened, and I can see that Bill was not matched as the enriched value is null.

5: Now try a different flow file, with Bill first.
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

Result
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

So because the first record did not match, no matches are made, and it looks as if the processing never happened.

6: Change the routing strategy to “Route to matched/unmatched”. The result is
Matched => [{"name":"fred","mood":"happy"},{"name":"charlie","mood":"sad"}]
Unmatched => [{"name":"bill"}]

So I have achieved all of my lookups, but the cost is I have fragmented my flow file. After 4 lookups my original flow file (which in production will have a. 1000 records) will have been fragmented into 16 separate files, with a consequent impact on performance. Also the indication that the unmatched record was not matched is lost, which may be a feature I would like to use.

So my question is, does this look like expected behaviour or is this an issue?

Thanks
Steve Hindmarch,
BT’s Global Division
This email contains information from BT, that might be privileged or confidential. And it's only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.
We monitor our email systems and may record all our emails.


British Telecommunications plc., 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000

RE: Unexpected Behaviour In LookupRecord With "Route To success" Strategy

Posted by st...@bt.com.

Thanks Mark,

Is there a JIRA open for this?

Regards
Steve


From: Mark Payne <ma...@hotmail.com>
Sent: 11 April 2022 14:34
To: users@nifi.apache.org
Subject: Re: Unexpected Behaviour In LookupRecord With "Route To success" Strategy

Steve,

Thanks for the note. Ironically, I ran into this issue just yesterday. Unfortunately, it’s a bug that will have to be addressed.

In the meantime, if you define the schema for your Record Writer explicitly, it should work as expected. The issue comes down to the fact that the first record is enriched. And then the schema is determined from the enriched data. Then the rest are enriched. But if the first one doesn’t have any enrichment data added, the result is that the schema is determined for the flowfile without any enrichment. So while the records do get enriched, the schema that is associated with the FlowFile is missing those fields. So explicitly defining the schema should work.

Thanks
-Mark



On Apr 11, 2022, at 9:08 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

Hi all,

I am trying to set up a simple enrichment pipeline where flow records get enriched from a Redis Distributed Map cache and I use a sequence of LookupRecord processors to gather the enrichment data. I am using the “Route to success” routing strategy because I would like to avoid fragmenting my flow files. However, the results are not what I expected and if the first record does not match an enrichment record then no records get enriched.

Here is a simple test case I have created.

1: Create a lookup record processor with the following parameters
Result RecordPath = /mood
Routing Strategy = Route To Success
key = concat('mood/',name)

2: Add these keys to my Redis index.
set mood/fred happy
set mood/charlie sad

3: Send in this flow file
[{"name":"fred"},{"name":"bill"},{"name":"charlie"}]

4: View the result
[{"name":"fred","mood":"happy"},{"name":"bill","mood":null},{"name":"charlie","mood":"sad"}]

That looks OK, every lookup has happened, and I can see that Bill was not matched as the enriched value is null.

5: Now try a different flow file, with Bill first.
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

Result
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

So because the first record did not match, no matches are made, and it looks as if the processing never happened.

6: Change the routing strategy to “Route to matched/unmatched”. The result is
Matched => [{"name":"fred","mood":"happy"},{"name":"charlie","mood":"sad"}]
Unmatched => [{"name":"bill"}]

So I have achieved all of my lookups, but the cost is I have fragmented my flow file. After 4 lookups my original flow file (which in production will have a. 1000 records) will have been fragmented into 16 separate files, with a consequent impact on performance. Also the indication that the unmatched record was not matched is lost, which may be a feature I would like to use.

So my question is, does this look like expected behaviour or is this an issue?

Thanks
Steve Hindmarch,
BT’s Global Division
This email contains information from BT, that might be privileged or confidential. And it's only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.
We monitor our email systems and may record all our emails.


British Telecommunications plc., 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000

Re: Unexpected Behaviour In LookupRecord With "Route To success" Strategy

Posted by Mark Payne <ma...@hotmail.com>.

Steve,

Thanks for the note. Ironically, I ran into this issue just yesterday. Unfortunately, it’s a bug that will have to be addressed.

In the meantime, if you define the schema for your Record Writer explicitly, it should work as expected. The issue comes down to the fact that the first record is enriched. And then the schema is determined from the enriched data. Then the rest are enriched. But if the first one doesn’t have any enrichment data added, the result is that the schema is determined for the flowfile without any enrichment. So while the records do get enriched, the schema that is associated with the FlowFile is missing those fields. So explicitly defining the schema should work.

Thanks
-Mark


On Apr 11, 2022, at 9:08 AM, stephen.hindmarch@bt.com<ma...@bt.com> wrote:

Hi all,

I am trying to set up a simple enrichment pipeline where flow records get enriched from a Redis Distributed Map cache and I use a sequence of LookupRecord processors to gather the enrichment data. I am using the “Route to success” routing strategy because I would like to avoid fragmenting my flow files. However, the results are not what I expected and if the first record does not match an enrichment record then no records get enriched.

Here is a simple test case I have created.

1: Create a lookup record processor with the following parameters
Result RecordPath = /mood
Routing Strategy = Route To Success
key = concat('mood/',name)

2: Add these keys to my Redis index.
set mood/fred happy
set mood/charlie sad

3: Send in this flow file
[{"name":"fred"},{"name":"bill"},{"name":"charlie"}]

4: View the result
[{"name":"fred","mood":"happy"},{"name":"bill","mood":null},{"name":"charlie","mood":"sad"}]

That looks OK, every lookup has happened, and I can see that Bill was not matched as the enriched value is null.

5: Now try a different flow file, with Bill first.
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

Result
[{"name":"bill"},{"name":"fred"},{"name":"charlie"}]

So because the first record did not match, no matches are made, and it looks as if the processing never happened.

6: Change the routing strategy to “Route to matched/unmatched”. The result is
Matched => [{"name":"fred","mood":"happy"},{"name":"charlie","mood":"sad"}]
Unmatched => [{"name":"bill"}]

So I have achieved all of my lookups, but the cost is I have fragmented my flow file. After 4 lookups my original flow file (which in production will have a. 1000 records) will have been fragmented into 16 separate files, with a consequent impact on performance. Also the indication that the unmatched record was not matched is lost, which may be a feature I would like to use.

So my question is, does this look like expected behaviour or is this an issue?

Thanks
Steve Hindmarch,
BT’s Global Division
This email contains information from BT, that might be privileged or confidential. And it's only meant for the person above. If that's not you, we're sorry - we must have sent it to you by mistake. Please email us to let us know, and don't copy or forward it to anyone else. Thanks.
We monitor our email systems and may record all our emails.

British Telecommunications plc., 81 Newgate Street London EC1A 7AJ
Registered in England no: 1800000