You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jeremy Farbota <jf...@payoff.com> on 2016/10/20 18:03:01 UTC

provenance & content repos re infosec

Hello,

I'm using NiFi in a compliance setting. One of my use cases is for
deheading (hashing names, ssns, etc) and republishing. It works great for
these tasks but I need to cover my bases to make sure things are not stored
on disk. E.g. when I extract a name to an attribute for hashing, I do not
want to store it unencrypted at rest in the provenance repo.

It seems I can turn off the content repo with this setting:
nifi.content.repository.archive.enabled=false

Is flowfile content stored on disk anywhere once the flowfile is dropped
with the setting above?

Regarding the provenance repo, the settings offer the ability to truncate
the attribute on retrieval e.g.

nifi.provenance.repository.max.attribute.length=8

Does the above setting change only what can be retrieved or does it limit
what is stored?

If it is still storing all the attributes, then I will likely need to
greatly reduce the provenance repo max.storage.time. Would severely
limiting the provenance or content repo negatively affect NiFi's
performance?

Is there a way that I can have these "secure" settings only for certain
templates? Or are these provenance and content repo setting only
configurable server wide?

Has there ever been thought to enable encryption at rest of the provenance
repo to deal with situations like mine?

Thanks in advance.

-- 

[image: Payoff, Inc.] <http://www.payoff.com/>

Jeremy Farbota
Software Engineer, Data
jfarbota@payoff.com <em...@payoff.com> • (217) 898-8110 <(949)+430-0630>

I'm a Storyteller. Discover your Financial Personality!
<https://www.payoff.com/quiz>

[image: Facebook]  <https://www.facebook.com/payoff> [image: Twitter]
<https://www.twitter.com/payoff> [image: Linkedin]
<https://www.linkedin.com/company/payoff-com>

Re: provenance & content repos re infosec

Posted by Jeremy Farbota <jf...@payoff.com>.
Andy,

Immense thanks for the thoroughly helpful response.

I'll join the dev list and look forward to hearing about the new features.
This is great news and all of those features are things we would use.

Kindly,


On Thu, Oct 20, 2016 at 11:48 AM, Andy LoPresto <al...@apache.org>
wrote:

> Hi Jeremy,
>
> These are great questions and I appreciate your interest in securing data
> at all stages for your application.
>
> Setting nifi.content.repository.archive.enabled=false will turn off
> content repository archiving, but the content will still sit at rest on the
> file system for some period of time (while the data is in use during the
> flow). To completely avoid persisting any content data to the file system,
> set nifi.content.repository.implementation=org.apache.
> nifi.controller.repository.VolatileContentRepository. This will direct
> NiFi to store the content in-memory during operation (with the
> understanding that power loss could cause data loss).
>
> You can set a similar value to do the same with the provenance repository,
> with the same caveat. nifi.provenance.repository.implementation=org.
> apache.nifi.provenance.VolatileProvenanceRepository.
>
> Unfortunately, at this time these settings are global for all NiFi data,
> rather than specific to a processor/process group.
>
> I am working on efforts to provide the following features (and need to get
> them posted in the wiki roadmap to solicit feedback from the community):
>
> * Transparent data encryption for repositories
> * Provenance
> * Content
> * Flowfile (attributes)
> * Sensitive attributes
> * Cryptographic signatures for provenance event records and lineage chains
> * Features to ease data segmentation/isolation (i.e. raw data comes into
> input port/source processor, it is routed by attribute/signature to
> different nodes/clusters with varying security levels or underlying
> security hardening/policies)
>
> I would suggest you stay tuned to the mailing list (off the top of my
> head, I can’t remember if changes to the wiki are posted to users@, so
> you might want to subscribe to dev@ as well) and welcome your input on
> these feature development efforts. There are some other members of our
> community similarly security-minded, and I think we will get some great
> collaboration on this moving forward.
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Oct 20, 2016, at 2:03 PM, Jeremy Farbota <jf...@payoff.com> wrote:
>
> Hello,
>
> I'm using NiFi in a compliance setting. One of my use cases is for
> deheading (hashing names, ssns, etc) and republishing. It works great for
> these tasks but I need to cover my bases to make sure things are not stored
> on disk. E.g. when I extract a name to an attribute for hashing, I do not
> want to store it unencrypted at rest in the provenance repo.
>
> It seems I can turn off the content repo with this setting:
> nifi.content.repository.archive.enabled=false
>
> Is flowfile content stored on disk anywhere once the flowfile is dropped
> with the setting above?
>
> Regarding the provenance repo, the settings offer the ability to truncate
> the attribute on retrieval e.g.
>
> nifi.provenance.repository.max.attribute.length=8
>
> Does the above setting change only what can be retrieved or does it limit
> what is stored?
>
> If it is still storing all the attributes, then I will likely need to
> greatly reduce the provenance repo max.storage.time. Would severely
> limiting the provenance or content repo negatively affect NiFi's
> performance?
>
> Is there a way that I can have these "secure" settings only for certain
> templates? Or are these provenance and content repo setting only
> configurable server wide?
>
> Has there ever been thought to enable encryption at rest of the provenance
> repo to deal with situations like mine?
>
> Thanks in advance.
>
> --
>
> [image: Payoff, Inc.] <http://www.payoff.com/>
>
> Jeremy Farbota
> Software Engineer, Data
> jfarbota@payoff.com <em...@payoff.com> • (217) 898-8110 <(949)+430-0630>
>
> I'm a Storyteller. Discover your Financial Personality!
> <https://www.payoff.com/quiz>
>
> [image: Facebook]  <https://www.facebook.com/payoff> [image: Twitter]
> <https://www.twitter.com/payoff> [image: Linkedin]
> <https://www.linkedin.com/company/payoff-com>
>
>
>


-- 

[image: Payoff, Inc.] <http://www.payoff.com/>

Jeremy Farbota
Software Engineer, Data
jfarbota@payoff.com <em...@payoff.com> • (217) 898-8110 <(949)+430-0630>

I'm a Storyteller. Discover your Financial Personality!
<https://www.payoff.com/quiz>

[image: Facebook]  <https://www.facebook.com/payoff> [image: Twitter]
<https://www.twitter.com/payoff> [image: Linkedin]
<https://www.linkedin.com/company/payoff-com>

Re: provenance & content repos re infosec

Posted by Andy LoPresto <al...@apache.org>.
Hi Jeremy,

These are great questions and I appreciate your interest in securing data at all stages for your application.

Setting nifi.content.repository.archive.enabled=false will turn off content repository archiving, but the content will still sit at rest on the file system for some period of time (while the data is in use during the flow). To completely avoid persisting any content data to the file system, set nifi.content.repository.implementation=org.apache.nifi.controller.repository.VolatileContentRepository. This will direct NiFi to store the content in-memory during operation (with the understanding that power loss could cause data loss).

You can set a similar value to do the same with the provenance repository, with the same caveat. nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository.

Unfortunately, at this time these settings are global for all NiFi data, rather than specific to a processor/process group.

I am working on efforts to provide the following features (and need to get them posted in the wiki roadmap to solicit feedback from the community):

* Transparent data encryption for repositories
	* Provenance
	* Content
	* Flowfile (attributes)
* Sensitive attributes
* Cryptographic signatures for provenance event records and lineage chains
* Features to ease data segmentation/isolation (i.e. raw data comes into input port/source processor, it is routed by attribute/signature to different nodes/clusters with varying security levels or underlying security hardening/policies)

I would suggest you stay tuned to the mailing list (off the top of my head, I can’t remember if changes to the wiki are posted to users@, so you might want to subscribe to dev@ as well) and welcome your input on these feature development efforts. There are some other members of our community similarly security-minded, and I think we will get some great collaboration on this moving forward.

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 20, 2016, at 2:03 PM, Jeremy Farbota <jf...@payoff.com> wrote:
> 
> Hello,
> 
> I'm using NiFi in a compliance setting. One of my use cases is for deheading (hashing names, ssns, etc) and republishing. It works great for these tasks but I need to cover my bases to make sure things are not stored on disk. E.g. when I extract a name to an attribute for hashing, I do not want to store it unencrypted at rest in the provenance repo.
> 
> It seems I can turn off the content repo with this setting:
> nifi.content.repository.archive.enabled=false
> 
> Is flowfile content stored on disk anywhere once the flowfile is dropped with the setting above?
> 
> Regarding the provenance repo, the settings offer the ability to truncate the attribute on retrieval e.g.
> nifi.provenance.repository.max.attribute.length=8
> 
> Does the above setting change only what can be retrieved or does it limit what is stored?
> 
> If it is still storing all the attributes, then I will likely need to greatly reduce the provenance repo max.storage.time. Would severely limiting the provenance or content repo negatively affect NiFi's performance?
> 
> Is there a way that I can have these "secure" settings only for certain templates? Or are these provenance and content repo setting only configurable server wide?
> 
> Has there ever been thought to enable encryption at rest of the provenance repo to deal with situations like mine?
> 
> Thanks in advance.
> 
> --
> 
>  <http://www.payoff.com/>
> Jeremy Farbota
> Software Engineer, Data
> jfarbota@payoff.com <ma...@payoff.com> • (217) 898-8110 <tel:(949)+430-0630>
> I'm a Storyteller. Discover your Financial Personality! <https://www.payoff.com/quiz>
>   <https://www.facebook.com/payoff>   <https://www.twitter.com/payoff>  <https://www.linkedin.com/company/payoff-com>