You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Adam J. Shook" <ad...@gmail.com> on 2017/04/03 23:34:28 UTC

Re: AWSCredentialsProviderControllerService with expression language

Spent a little time looking into this.  I've created a new controller
service that leverages expression language to dynamically set the Assume
Role ARN.  From reviewing the various S3 processors, it looks like they all
operate on a single FlowFile at a time with no batching.  There is some
multi-part uploads, however it is still a single FlowFile that is uploaded
in chunks.  The attributes themselves wouldn't change and this
significantly simplifies the implementation.  Does that sound correct?

On Wed, Mar 29, 2017 at 3:01 PM, Adam J. Shook <ad...@gmail.com> wrote:

> Hi James,
>
> Thank you for the swift reply!
>
> NiFi is acting as a super-user that will interact with AWS to grab some
> files, do some transformations, and put them elsewhere.  While we can
> create a role for NiFi to access all necessary S3 buckets and other AWS
> services, we need to make sure it is a secure solution to ensure that users
> can't tell NiFi to retrieve a file they shouldn't have access to (but they
> can get it because NiFi is a super user).  So we need the credentials
> provider to be a bit more flexible and say: move this file from A to B as
> this role, and the role to use is a FlowFile attribute.  To answer your
> question on how many, it'd be a few dozen roles at most.
>
> I'm digging through source code on the various AWS processors and
> controller services and had thought about your first option there --
> expanding the property list to include the Assume Role properties.  I would
> agree that #2 is a bit more robust and will do some more digging there.  As
> usual, it's a feature that I need yesterday and will likely take the path
> of least resistance.
>
> --Adam
>
> On Wed, Mar 29, 2017 at 2:21 PM, James Wing <jv...@gmail.com> wrote:
>
>> Adam,
>>
>> Would you please share a bit more about why the various roles and how
>> many you would have?  I'm curious how it's working in practice, we don't
>> always get feedback when stuff isn't broken :).
>>
>> You are correct that the current AWSCredentialsProviderControllerService
>> assumes a single, unvaried role.  The interface between the processors and
>> the controller service does not provide any per-flowfile information to the
>> controller service to retrieve credentials.  A new controller service isn't
>> enough.  I can think of two options:
>>
>> 1.) Modify Processors to Assume Role via Expression - The existing AWS
>> processors accept both the controller service as well as specific
>> credentials directly applied to the processor -- like AccessKey/SecretKey
>> pair.  The processors could be modified to expand this list to include the
>> Assume Role properties, such that they could be applied via expression
>> language.  It means that STS:AssumeRole would be called once for each
>> execution of the processor, so the temporary credentials would not be
>> reused and would not present an expiration problem.  Some of the work is
>> already in place via the CredentialPropertyDescriptors.  But they would
>> need to be added to processors, and expression language support enabled.
>>
>> 2.) New Controller Service and Processors - A more robust path would be
>> to create a new "credential pool" controller service that created and
>> maintained credentials for the various roles, and dispensed credentials to
>> the processors similarly to the existing AWSCredentialsProviderControllerService.
>> But the downside is that this would require a new interface between the
>> processors and the controller service, so you would need to not only
>> provide a new controller service, but also modifications to the AWS
>> processors to provide role identifiers to the controller service.
>>
>> * You would also have to be careful of batching operations in the
>> existing processors that would assume the same credentials apply to all
>> incoming flowfiles.  The DynamoDB and Kinesis processors are examples that
>> use AWS batch APIs.
>>
>>
>> Thanks,
>>
>> James
>>
>> On Wed, Mar 29, 2017 at 10:11 AM, Adam J. Shook <ad...@gmail.com>
>> wrote:
>>
>>> I've got a use case where files from S3 will need to fetched/put by
>>> dynamically assuming a role.  I see that the AWSCredentialsProviderControllerService
>>> supports setting an assumed role, however it is a fixed value.   I'm not
>>> too familiar with the controller service API -- Would it be
>>> possible/difficult to change/extend the controller service to support
>>> expression language so I can assume a role on a per-FlowFile basis?   Or
>>> would this need to be a custom processor?  Happy to do the leg work, just
>>> looking for some direction on where to start.
>>>
>>> Thank you,
>>> --Adam
>>>
>>
>>
>

Re: AWSCredentialsProviderControllerService with expression language

Posted by "Adam J. Shook" <ad...@gmail.com>.
Yeah, onTrigger passes the context and FlowFile to getClient which in turn
uses them to configure a new credentials provider if the pool is being
used.  Else it just returns the client as it did before before, ignoring
the context and FlowFile

I'm looking to get permission to put in a PR for this -- it'd be good to
get some trained eyes on it and I think it could be beneficial over all.
Stay tuned!

On Tue, Apr 4, 2017 at 12:26 AM, James Wing <jv...@gmail.com> wrote:

> Adam,
>
> That sounds reasonable to me.  Did you modify the processor(s) to get
> credentials/client in onTrigger() rather than in onScheduled()?
>
>
> Thanks,
>
> James
>
> On Mon, Apr 3, 2017 at 4:34 PM, Adam J. Shook <ad...@gmail.com>
> wrote:
>
>> Spent a little time looking into this.  I've created a new controller
>> service that leverages expression language to dynamically set the Assume
>> Role ARN.  From reviewing the various S3 processors, it looks like they all
>> operate on a single FlowFile at a time with no batching.  There is some
>> multi-part uploads, however it is still a single FlowFile that is uploaded
>> in chunks.  The attributes themselves wouldn't change and this
>> significantly simplifies the implementation.  Does that sound correct?
>>
>> On Wed, Mar 29, 2017 at 3:01 PM, Adam J. Shook <ad...@gmail.com>
>> wrote:
>>
>>> Hi James,
>>>
>>> Thank you for the swift reply!
>>>
>>> NiFi is acting as a super-user that will interact with AWS to grab some
>>> files, do some transformations, and put them elsewhere.  While we can
>>> create a role for NiFi to access all necessary S3 buckets and other AWS
>>> services, we need to make sure it is a secure solution to ensure that users
>>> can't tell NiFi to retrieve a file they shouldn't have access to (but they
>>> can get it because NiFi is a super user).  So we need the credentials
>>> provider to be a bit more flexible and say: move this file from A to B as
>>> this role, and the role to use is a FlowFile attribute.  To answer your
>>> question on how many, it'd be a few dozen roles at most.
>>>
>>> I'm digging through source code on the various AWS processors and
>>> controller services and had thought about your first option there --
>>> expanding the property list to include the Assume Role properties.  I would
>>> agree that #2 is a bit more robust and will do some more digging there.  As
>>> usual, it's a feature that I need yesterday and will likely take the path
>>> of least resistance.
>>>
>>> --Adam
>>>
>>> On Wed, Mar 29, 2017 at 2:21 PM, James Wing <jv...@gmail.com> wrote:
>>>
>>>> Adam,
>>>>
>>>> Would you please share a bit more about why the various roles and how
>>>> many you would have?  I'm curious how it's working in practice, we don't
>>>> always get feedback when stuff isn't broken :).
>>>>
>>>> You are correct that the current AWSCredentialsProviderControllerService
>>>> assumes a single, unvaried role.  The interface between the processors and
>>>> the controller service does not provide any per-flowfile information to the
>>>> controller service to retrieve credentials.  A new controller service isn't
>>>> enough.  I can think of two options:
>>>>
>>>> 1.) Modify Processors to Assume Role via Expression - The existing AWS
>>>> processors accept both the controller service as well as specific
>>>> credentials directly applied to the processor -- like AccessKey/SecretKey
>>>> pair.  The processors could be modified to expand this list to include the
>>>> Assume Role properties, such that they could be applied via expression
>>>> language.  It means that STS:AssumeRole would be called once for each
>>>> execution of the processor, so the temporary credentials would not be
>>>> reused and would not present an expiration problem.  Some of the work is
>>>> already in place via the CredentialPropertyDescriptors.  But they
>>>> would need to be added to processors, and expression language support
>>>> enabled.
>>>>
>>>> 2.) New Controller Service and Processors - A more robust path would be
>>>> to create a new "credential pool" controller service that created and
>>>> maintained credentials for the various roles, and dispensed credentials to
>>>> the processors similarly to the existing AWSCredentialsProviderControllerService.
>>>> But the downside is that this would require a new interface between the
>>>> processors and the controller service, so you would need to not only
>>>> provide a new controller service, but also modifications to the AWS
>>>> processors to provide role identifiers to the controller service.
>>>>
>>>> * You would also have to be careful of batching operations in the
>>>> existing processors that would assume the same credentials apply to all
>>>> incoming flowfiles.  The DynamoDB and Kinesis processors are examples that
>>>> use AWS batch APIs.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> James
>>>>
>>>> On Wed, Mar 29, 2017 at 10:11 AM, Adam J. Shook <ad...@gmail.com>
>>>> wrote:
>>>>
>>>>> I've got a use case where files from S3 will need to fetched/put by
>>>>> dynamically assuming a role.  I see that the AWSCredentialsProviderControllerService
>>>>> supports setting an assumed role, however it is a fixed value.   I'm not
>>>>> too familiar with the controller service API -- Would it be
>>>>> possible/difficult to change/extend the controller service to support
>>>>> expression language so I can assume a role on a per-FlowFile basis?   Or
>>>>> would this need to be a custom processor?  Happy to do the leg work, just
>>>>> looking for some direction on where to start.
>>>>>
>>>>> Thank you,
>>>>> --Adam
>>>>>
>>>>
>>>>
>>>
>>
>

Re: AWSCredentialsProviderControllerService with expression language

Posted by James Wing <jv...@gmail.com>.
Adam,

That sounds reasonable to me.  Did you modify the processor(s) to get
credentials/client in onTrigger() rather than in onScheduled()?


Thanks,

James

On Mon, Apr 3, 2017 at 4:34 PM, Adam J. Shook <ad...@gmail.com> wrote:

> Spent a little time looking into this.  I've created a new controller
> service that leverages expression language to dynamically set the Assume
> Role ARN.  From reviewing the various S3 processors, it looks like they all
> operate on a single FlowFile at a time with no batching.  There is some
> multi-part uploads, however it is still a single FlowFile that is uploaded
> in chunks.  The attributes themselves wouldn't change and this
> significantly simplifies the implementation.  Does that sound correct?
>
> On Wed, Mar 29, 2017 at 3:01 PM, Adam J. Shook <ad...@gmail.com>
> wrote:
>
>> Hi James,
>>
>> Thank you for the swift reply!
>>
>> NiFi is acting as a super-user that will interact with AWS to grab some
>> files, do some transformations, and put them elsewhere.  While we can
>> create a role for NiFi to access all necessary S3 buckets and other AWS
>> services, we need to make sure it is a secure solution to ensure that users
>> can't tell NiFi to retrieve a file they shouldn't have access to (but they
>> can get it because NiFi is a super user).  So we need the credentials
>> provider to be a bit more flexible and say: move this file from A to B as
>> this role, and the role to use is a FlowFile attribute.  To answer your
>> question on how many, it'd be a few dozen roles at most.
>>
>> I'm digging through source code on the various AWS processors and
>> controller services and had thought about your first option there --
>> expanding the property list to include the Assume Role properties.  I would
>> agree that #2 is a bit more robust and will do some more digging there.  As
>> usual, it's a feature that I need yesterday and will likely take the path
>> of least resistance.
>>
>> --Adam
>>
>> On Wed, Mar 29, 2017 at 2:21 PM, James Wing <jv...@gmail.com> wrote:
>>
>>> Adam,
>>>
>>> Would you please share a bit more about why the various roles and how
>>> many you would have?  I'm curious how it's working in practice, we don't
>>> always get feedback when stuff isn't broken :).
>>>
>>> You are correct that the current AWSCredentialsProviderControllerService
>>> assumes a single, unvaried role.  The interface between the processors and
>>> the controller service does not provide any per-flowfile information to the
>>> controller service to retrieve credentials.  A new controller service isn't
>>> enough.  I can think of two options:
>>>
>>> 1.) Modify Processors to Assume Role via Expression - The existing AWS
>>> processors accept both the controller service as well as specific
>>> credentials directly applied to the processor -- like AccessKey/SecretKey
>>> pair.  The processors could be modified to expand this list to include the
>>> Assume Role properties, such that they could be applied via expression
>>> language.  It means that STS:AssumeRole would be called once for each
>>> execution of the processor, so the temporary credentials would not be
>>> reused and would not present an expiration problem.  Some of the work is
>>> already in place via the CredentialPropertyDescriptors.  But they would
>>> need to be added to processors, and expression language support enabled.
>>>
>>> 2.) New Controller Service and Processors - A more robust path would be
>>> to create a new "credential pool" controller service that created and
>>> maintained credentials for the various roles, and dispensed credentials to
>>> the processors similarly to the existing AWSCredentialsProviderControllerService.
>>> But the downside is that this would require a new interface between the
>>> processors and the controller service, so you would need to not only
>>> provide a new controller service, but also modifications to the AWS
>>> processors to provide role identifiers to the controller service.
>>>
>>> * You would also have to be careful of batching operations in the
>>> existing processors that would assume the same credentials apply to all
>>> incoming flowfiles.  The DynamoDB and Kinesis processors are examples that
>>> use AWS batch APIs.
>>>
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On Wed, Mar 29, 2017 at 10:11 AM, Adam J. Shook <ad...@gmail.com>
>>> wrote:
>>>
>>>> I've got a use case where files from S3 will need to fetched/put by
>>>> dynamically assuming a role.  I see that the AWSCredentialsProviderControllerService
>>>> supports setting an assumed role, however it is a fixed value.   I'm not
>>>> too familiar with the controller service API -- Would it be
>>>> possible/difficult to change/extend the controller service to support
>>>> expression language so I can assume a role on a per-FlowFile basis?   Or
>>>> would this need to be a custom processor?  Happy to do the leg work, just
>>>> looking for some direction on where to start.
>>>>
>>>> Thank you,
>>>> --Adam
>>>>
>>>
>>>
>>
>