You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Mike Thomsen <mi...@gmail.com> on 2018/06/08 16:06:23 UTC

LookupService + flowfile attributes

On the RestLookupService PR I think Koji mentioned the idea of expanding
the lookup capability to include flowfile attributes. That sort of thing
would be immensely useful on two PRs I have already open for lookup service
changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
would be a new PR that adds:

T lookup(Map<String, String> flowfileAttributes, Map<String, Object>
coordinates);

to the LookupService interface and has the related processors pass in the
flowfile attribute map. Specifically, it would help make the schema access
capabilities really usable with lookup services (see MongoDBLookupService
PR for example; I added a new SchemaRegistryService type for JSON sources)

Re: LookupService + flowfile attributes

Posted by Mike Thomsen <mi...@gmail.com>.
I think the only services that won't break if you merge the variables are
the ones that use only mandatory keys. Ones like MongoDBLookupService will
simply take the entire lookup coordinate map and turn it into a query. I
don't think there's an elegant solution to allow the service to pick and
choose in that case.

On Tue, Jun 12, 2018 at 4:43 AM Koji Kawamura <ij...@gmail.com>
wrote:

> Hi Mike,
>
> I'm still not sure which is better, separating the variables or
> containing all values.
> I wrote a comment so that we can keep the discussion there.
> https://github.com/apache/nifi/pull/2777#issuecomment-396512384
>
> Thanks,
> Koji
>
> On Tue, Jun 12, 2018 at 1:56 AM, Mike Thomsen <mi...@gmail.com>
> wrote:
> > Koji,
> >
> > After reading Mark's comments on GitHub, it occurred to me that the
> MongoDB
> > lookup service and the ES one I have as a PR would be screwed up if we
> take
> > the original approach because they blindly build a query from the total
> > coordinates set. So they'd add flowfile attributes as criteria by
> default.
> > I'll update the PR accordingly and make the new method default to the
> > existing one in all of the lookup services that are already there.
> >
> > On Sat, Jun 9, 2018 at 8:44 AM Mike Thomsen <mi...@gmail.com>
> wrote:
> >
> >> https://issues.apache.org/jira/browse/NIFI-5287
> >>
> >> On Sat, Jun 9, 2018 at 1:20 AM Koji Kawamura <ij...@gmail.com>
> >> wrote:
> >>
> >>> Thanks Mike for starting the discussion.
> >>>
> >>> Yes, I believe that will make LookupService and Schema access strategy
> >>> much easier, reusable, and useful.
> >>>
> >>> What I was imagined is not adding new method signature, but simply
> >>> copy certain FlowFile attributes into the coordinates map.
> >>> We can add that at LookupRecord.
> >>> Currently LookupAttribute only uses one coordinate value and can be
> >>> left as it is.
> >>>
> >>> Specifically, by adding new processor property, 'Copy FlowFile
> >>> Attributes into Coordinates', where user can define RegularExpression
> >>> to select which attributes to copy.
> >>> I think it's fine to mix FlowFile attributes and values defined as
> >>> dynamic properties into the same coordinates map.
> >>> The put oder should be FlowFile attributes, then dynamic properties,
> >>> so that user can overwrite attribute values when necessary.
> >>>
> >>> Koji
> >>>
> >>>
> >>> On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen <mi...@gmail.com>
> >>> wrote:
> >>> > On the RestLookupService PR I think Koji mentioned the idea of
> expanding
> >>> > the lookup capability to include flowfile attributes. That sort of
> thing
> >>> > would be immensely useful on two PRs I have already open for lookup
> >>> service
> >>> > changes for ES and Mongo. Koji, add your thoughts, but what I'm
> thinking
> >>> > would be a new PR that adds:
> >>> >
> >>> > T lookup(Map<String, String> flowfileAttributes, Map<String, Object>
> >>> > coordinates);
> >>> >
> >>> > to the LookupService interface and has the related processors pass in
> >>> the
> >>> > flowfile attribute map. Specifically, it would help make the schema
> >>> access
> >>> > capabilities really usable with lookup services (see
> >>> MongoDBLookupService
> >>> > PR for example; I added a new SchemaRegistryService type for JSON
> >>> sources)
> >>>
> >>
>

Re: LookupService + flowfile attributes

Posted by Koji Kawamura <ij...@gmail.com>.
Hi Mike,

I'm still not sure which is better, separating the variables or
containing all values.
I wrote a comment so that we can keep the discussion there.
https://github.com/apache/nifi/pull/2777#issuecomment-396512384

Thanks,
Koji

On Tue, Jun 12, 2018 at 1:56 AM, Mike Thomsen <mi...@gmail.com> wrote:
> Koji,
>
> After reading Mark's comments on GitHub, it occurred to me that the MongoDB
> lookup service and the ES one I have as a PR would be screwed up if we take
> the original approach because they blindly build a query from the total
> coordinates set. So they'd add flowfile attributes as criteria by default.
> I'll update the PR accordingly and make the new method default to the
> existing one in all of the lookup services that are already there.
>
> On Sat, Jun 9, 2018 at 8:44 AM Mike Thomsen <mi...@gmail.com> wrote:
>
>> https://issues.apache.org/jira/browse/NIFI-5287
>>
>> On Sat, Jun 9, 2018 at 1:20 AM Koji Kawamura <ij...@gmail.com>
>> wrote:
>>
>>> Thanks Mike for starting the discussion.
>>>
>>> Yes, I believe that will make LookupService and Schema access strategy
>>> much easier, reusable, and useful.
>>>
>>> What I was imagined is not adding new method signature, but simply
>>> copy certain FlowFile attributes into the coordinates map.
>>> We can add that at LookupRecord.
>>> Currently LookupAttribute only uses one coordinate value and can be
>>> left as it is.
>>>
>>> Specifically, by adding new processor property, 'Copy FlowFile
>>> Attributes into Coordinates', where user can define RegularExpression
>>> to select which attributes to copy.
>>> I think it's fine to mix FlowFile attributes and values defined as
>>> dynamic properties into the same coordinates map.
>>> The put oder should be FlowFile attributes, then dynamic properties,
>>> so that user can overwrite attribute values when necessary.
>>>
>>> Koji
>>>
>>>
>>> On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen <mi...@gmail.com>
>>> wrote:
>>> > On the RestLookupService PR I think Koji mentioned the idea of expanding
>>> > the lookup capability to include flowfile attributes. That sort of thing
>>> > would be immensely useful on two PRs I have already open for lookup
>>> service
>>> > changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
>>> > would be a new PR that adds:
>>> >
>>> > T lookup(Map<String, String> flowfileAttributes, Map<String, Object>
>>> > coordinates);
>>> >
>>> > to the LookupService interface and has the related processors pass in
>>> the
>>> > flowfile attribute map. Specifically, it would help make the schema
>>> access
>>> > capabilities really usable with lookup services (see
>>> MongoDBLookupService
>>> > PR for example; I added a new SchemaRegistryService type for JSON
>>> sources)
>>>
>>

Re: LookupService + flowfile attributes

Posted by Mike Thomsen <mi...@gmail.com>.
Koji,

After reading Mark's comments on GitHub, it occurred to me that the MongoDB
lookup service and the ES one I have as a PR would be screwed up if we take
the original approach because they blindly build a query from the total
coordinates set. So they'd add flowfile attributes as criteria by default.
I'll update the PR accordingly and make the new method default to the
existing one in all of the lookup services that are already there.

On Sat, Jun 9, 2018 at 8:44 AM Mike Thomsen <mi...@gmail.com> wrote:

> https://issues.apache.org/jira/browse/NIFI-5287
>
> On Sat, Jun 9, 2018 at 1:20 AM Koji Kawamura <ij...@gmail.com>
> wrote:
>
>> Thanks Mike for starting the discussion.
>>
>> Yes, I believe that will make LookupService and Schema access strategy
>> much easier, reusable, and useful.
>>
>> What I was imagined is not adding new method signature, but simply
>> copy certain FlowFile attributes into the coordinates map.
>> We can add that at LookupRecord.
>> Currently LookupAttribute only uses one coordinate value and can be
>> left as it is.
>>
>> Specifically, by adding new processor property, 'Copy FlowFile
>> Attributes into Coordinates', where user can define RegularExpression
>> to select which attributes to copy.
>> I think it's fine to mix FlowFile attributes and values defined as
>> dynamic properties into the same coordinates map.
>> The put oder should be FlowFile attributes, then dynamic properties,
>> so that user can overwrite attribute values when necessary.
>>
>> Koji
>>
>>
>> On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen <mi...@gmail.com>
>> wrote:
>> > On the RestLookupService PR I think Koji mentioned the idea of expanding
>> > the lookup capability to include flowfile attributes. That sort of thing
>> > would be immensely useful on two PRs I have already open for lookup
>> service
>> > changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
>> > would be a new PR that adds:
>> >
>> > T lookup(Map<String, String> flowfileAttributes, Map<String, Object>
>> > coordinates);
>> >
>> > to the LookupService interface and has the related processors pass in
>> the
>> > flowfile attribute map. Specifically, it would help make the schema
>> access
>> > capabilities really usable with lookup services (see
>> MongoDBLookupService
>> > PR for example; I added a new SchemaRegistryService type for JSON
>> sources)
>>
>

Re: LookupService + flowfile attributes

Posted by Mike Thomsen <mi...@gmail.com>.
https://issues.apache.org/jira/browse/NIFI-5287

On Sat, Jun 9, 2018 at 1:20 AM Koji Kawamura <ij...@gmail.com> wrote:

> Thanks Mike for starting the discussion.
>
> Yes, I believe that will make LookupService and Schema access strategy
> much easier, reusable, and useful.
>
> What I was imagined is not adding new method signature, but simply
> copy certain FlowFile attributes into the coordinates map.
> We can add that at LookupRecord.
> Currently LookupAttribute only uses one coordinate value and can be
> left as it is.
>
> Specifically, by adding new processor property, 'Copy FlowFile
> Attributes into Coordinates', where user can define RegularExpression
> to select which attributes to copy.
> I think it's fine to mix FlowFile attributes and values defined as
> dynamic properties into the same coordinates map.
> The put oder should be FlowFile attributes, then dynamic properties,
> so that user can overwrite attribute values when necessary.
>
> Koji
>
>
> On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen <mi...@gmail.com>
> wrote:
> > On the RestLookupService PR I think Koji mentioned the idea of expanding
> > the lookup capability to include flowfile attributes. That sort of thing
> > would be immensely useful on two PRs I have already open for lookup
> service
> > changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
> > would be a new PR that adds:
> >
> > T lookup(Map<String, String> flowfileAttributes, Map<String, Object>
> > coordinates);
> >
> > to the LookupService interface and has the related processors pass in the
> > flowfile attribute map. Specifically, it would help make the schema
> access
> > capabilities really usable with lookup services (see MongoDBLookupService
> > PR for example; I added a new SchemaRegistryService type for JSON
> sources)
>

Re: LookupService + flowfile attributes

Posted by Koji Kawamura <ij...@gmail.com>.
Thanks Mike for starting the discussion.

Yes, I believe that will make LookupService and Schema access strategy
much easier, reusable, and useful.

What I was imagined is not adding new method signature, but simply
copy certain FlowFile attributes into the coordinates map.
We can add that at LookupRecord.
Currently LookupAttribute only uses one coordinate value and can be
left as it is.

Specifically, by adding new processor property, 'Copy FlowFile
Attributes into Coordinates', where user can define RegularExpression
to select which attributes to copy.
I think it's fine to mix FlowFile attributes and values defined as
dynamic properties into the same coordinates map.
The put oder should be FlowFile attributes, then dynamic properties,
so that user can overwrite attribute values when necessary.

Koji


On Sat, Jun 9, 2018 at 1:06 AM, Mike Thomsen <mi...@gmail.com> wrote:
> On the RestLookupService PR I think Koji mentioned the idea of expanding
> the lookup capability to include flowfile attributes. That sort of thing
> would be immensely useful on two PRs I have already open for lookup service
> changes for ES and Mongo. Koji, add your thoughts, but what I'm thinking
> would be a new PR that adds:
>
> T lookup(Map<String, String> flowfileAttributes, Map<String, Object>
> coordinates);
>
> to the LookupService interface and has the related processors pass in the
> flowfile attribute map. Specifically, it would help make the schema access
> capabilities really usable with lookup services (see MongoDBLookupService
> PR for example; I added a new SchemaRegistryService type for JSON sources)