You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2020/01/08 22:48:17 UTC

[GitHub] [nifi] alopresto opened a new pull request #3968: NIFI-3383 Implemented encrypted flowfile repository

alopresto opened a new pull request #3968: NIFI-3383 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968
 
 
   Thank you for submitting a contribution to Apache NiFi.
   
   Please provide a short description of the PR here:
   
   #### Description of PR
   
   _Enables an encrypted flowfile repository implementation to be used._
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [x] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [x] Has your PR been rebased against the latest commit within the target branch (typically `master`)?
   
   - [x] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [x] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [x] Have you written or updated unit tests to verify your changes?
   - [ ] Have you verified that the full build is successful on both JDK 8 and JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [x] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on a change in pull request #3968: NIFI-3383 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on a change in pull request #3968: NIFI-3383 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#discussion_r364482716
 
 

 ##########
 File path: nifi-commons/nifi-properties/src/main/java/org/apache/nifi/util/NiFiProperties.java
 ##########
 @@ -44,6 +46,7 @@
  * over time.
  */
 public abstract class NiFiProperties {
+    private static final Logger logger = LoggerFactory.getLogger(NiFiProperties.class);
 
 Review comment:
   I understand introducing logging into this module may be undesirable. I feel that there are certain scenarios which are detected internal to logic here which dictate logging a warning, but if there are impactful side-effects, this can be removed. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto edited a comment on issue #3968: NIFI-3383 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto edited a comment on issue #3968: NIFI-3383 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-572301627
 
 
   On a locally-running snapshot build, I ran the following smoke tests with a simple `GenerateFlowFile` (0 second scheduling to max out) -> `UpdateAttribute` (uppercase and lowercase static attribute values, writing to a new attribute and overwriting the existing) -> `LogAttribute` test flow. 
   
   1. tested normal configuration (i.e. no changes to `nifi.properties`) - works as expected
   1. tested simple encrypted configuration (`StaticKeyProvider` with single key defined) - works as expected
   1. tested simple encrypted configuration with existing plaintext flowfile repository - works as expected
   1. tested incorrect encrypted configuration (invalid class name for SKP) - clear errors during startup in `nifi-app.log`, initiates shutdown
   1. tested multiple available keys (two keys defined in SKP) - works as expected
   1. tested missing config (encryption enabled but no available keys) - clear errors during startup in `nifi-app.log`, initiates shutdown
   1. tested multiple available keys (following missing config above) - works as expected
   1. tested migration back to original key (2nd key still present) - works as expected
   1. tested key loss (remove original key entirely) - works as expected
       1. if existing flowfile repository with records encrypted using (now missing) key, cannot recover records, clear errors during startup, initiates shutdown
       1. if no existing flowfile repository or all records already encrypted using still available key, works as expected
   1. tested recovery from process loss (i.e. `kill -9 <nifi_pid>`) - works as expected
   
   Sample configuration for `nifi.properties`:
   
   ```
   # Add or remove lines as necessary
   ...
   # FlowFile Repository
   nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
   nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog
   nifi.flowfile.repository.directory=./flowfile_repository
   nifi.flowfile.repository.partitions=256
   nifi.flowfile.repository.checkpoint.interval=2 mins
   nifi.flowfile.repository.always.sync=false
   nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
   nifi.flowfile.repository.encryption.key.provider.location=
   nifi.flowfile.repository.encryption.key.id=K1
   nifi.flowfile.repository.encryption.key.id.K1=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
   nifi.flowfile.repository.encryption.key.id.K2=0000000000000000000000000000000000000000000000000000000000000000
   nifi.flowfile.repository.encryption.key.id.K3=00FF000000000000000000000000000000000000000000000000000000000000
   ...
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on issue #3968: NIFI-3383 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on issue #3968: NIFI-3383 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-572301627
 
 
   On a locally-running snapshot build, I ran the following smoke tests with a simple `GenerateFlowFile` (0 second scheduling to max out) -> `UpdateAttribute` (uppercase and lowercase static attribute values, writing to a new attribute and overwriting the existing) -> `LogAttribute` test flow. 
   
   1. tested normal configuration (i.e. no changes to `nifi.properties`) - works as expected
   1. tested simple encrypted configuration (`StaticKeyProvider` with single key defined) - works as expected
   1. tested simple encrypted configuration with existing plaintext flowfile repository - works as expected
   1. tested incorrect encrypted configuration (invalid class name for SKP) - clear errors during startup in `nifi-app.log`, initiates shutdown
   1. tested multiple available keys (two keys defined in SKP) - works as expected
   1. tested missing config (encryption enabled but no available keys) - clear errors during startup in `nifi-app.log`, initiates shutdown
   1. tested multiple available keys (following missing config above) - works as expected
   1. tested migration back to original key (2nd key still present) - works as expected
   1. tested key loss (remove original key entirely) - works as expected
       1. if existing flowfile repository with records encrypted using (now missing) key, cannot recover
       1. if no existing flowfile repository or all records already encrypted using still available key, works as expected
   1. tested recovery from process loss (i.e. `kill -9 <nifi_pid>`) - works as expected
   
   Sample configuration for `nifi.properties`:
   
   ```
   # Add or remove lines as necessary
   ...
   # FlowFile Repository
   nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
   nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog
   nifi.flowfile.repository.directory=./flowfile_repository
   nifi.flowfile.repository.partitions=256
   nifi.flowfile.repository.checkpoint.interval=2 mins
   nifi.flowfile.repository.always.sync=false
   nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
   nifi.flowfile.repository.encryption.key.provider.location=
   nifi.flowfile.repository.encryption.key.id=K1
   nifi.flowfile.repository.encryption.key.id.K1=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
   nifi.flowfile.repository.encryption.key.id.K2=0000000000000000000000000000000000000000000000000000000000000000
   nifi.flowfile.repository.encryption.key.id.K3=00FF000000000000000000000000000000000000000000000000000000000000
   ...
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] andrewmlim commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
andrewmlim commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#discussion_r364816410
 
 

 ##########
 File path: nifi-docs/src/main/asciidoc/administration-guide.adoc
 ##########
 @@ -2530,6 +2530,32 @@ implementation.
 |`nifi.flowfile.repository.always.sync`|If set to `true`, any change to the repository will be synchronized to the disk, meaning that NiFi will ask the operating system not to cache the information. This is very expensive and can significantly reduce NiFi performance. However, if it is `false`, there could be the potential for data loss if either there is a sudden power loss or the operating system crashes. The default value is `false`.
 |====
 
+[[encrypted-write-ahead-flowfile-repository-properties]]
+=== Encrypted Write Ahead FlowFile Repository Properties
+
+All of the properties defined above (see <<write-ahead-flowfile-repository,Write Ahead FlowFile Repository>>) still apply. Only encryption-specific properties are listed here. See <<user-guide.adoc#encrypted-flowfile,Encrypted FlowFile Repository in the User Guide>> for more information.
+
+NOTE: Unlike the encrypted content and provenance repositories, the repository implementation does not change here, only the _underlying write-ahead log implementation_. This allows for cleaner separation and more flexibility in implementation selection. The property that should be changed to enable encryption is `nifi.flowfile.repository.wal.implementation`.
+
+|====
+|*Property*|*Description*
+|`nifi.flowfile.repository.encryption.key.provider.implementation`|This is the fully-qualified class name of the **key provider**. A key provider is the datastore interface for accessing the encryption key to protect the content claims. There are currently two implementations -- `StaticKeyProvider` which reads a key directly from _nifi.properties_, and `FileBasedKeyProvider` which reads *n* many keys from an encrypted file. The interface is extensible, and HSM-backed or other providers are expected in the future.
+|`nifi.flowfile.repository.encryption.key.provider.location`|The path to the key definition resource (empty for `StaticKeyProvider`, `./keys.nkp` or similar path for `FileBasedKeyProvider`). For future providers like an HSM, this may be a connection string or URL.
+|`nifi.flowfile.repository.encryption.key.id`|The active key ID to use for encryption (e.g. `Key1`).
+|`nifi.flowfile.repository.encryption.key`|The key to use for `StaticKeyProvider`. The key format is hex-encoded (`0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210`) but can also be encrypted using the `./encrypt-config.sh` tool in NiFi Toolkit (see the <<toolkit-guide.adoc#encrypt_config_tool,Encrypt-Config Tool>> section in the link:toolkit-guide.html[NiFi Toolkit Guide] for more information).
+|`nifi.flowfile.repository.encryption.key.id.`*|Allows for additional keys to be specified for the `StaticKeyProvider`. For example, the line `nifi.flowfile.repository.encryption.key.id.Key2=012...210` would provide an available key `Key2`.
+|====
+
+The simplest configuration is below:
+
+....
+nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
+nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog
+nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
+nifi.flowfile.repository.encryption.key.provider.location=
+nifi.flowfile.repository.encryption.key.id=K1
 
 Review comment:
   The example provided earlier is "Key1", but here is it is "K1".  Perhaps change this to "Key1" to be consistent.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#discussion_r364955954
 
 

 ##########
 File path: nifi-docs/src/main/asciidoc/user-guide.adoc
 ##########
 @@ -2773,6 +2773,86 @@ When switching between implementation "families" (i.e. `VolatileContentRepositor
 * Multiple repositories -- No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository).
 * Corruption -- when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual claims (i.e. an entire repository file won't be irrecoverable due to the encryption). Some testing has been performed on scenarios where disk space is exhausted. While the flow can no longer write additional content claims to the repository in that case, the NiFi application continues to function properly, and successfully written content claims are still available via the Provenance Query operations. Stopping NiFi and removing the content repository (or moving it to a larger disk) resolves the issue.
 
+[[encrypted-flowfile]]
+== Encrypted FlowFile Repository
+While OS-level access control can offer some security over the flowfile attribute and content claim data written to the disk in a repository, there are scenarios where the data may be sensitive, compliance and regulatory requirements exist, or NiFi is running on hardware not under the direct control of the organization (cloud, etc.). In this case, the flowfile repository allows for all data to be encrypted before being persisted to the disk. For more information on the internal workings of the flowfile repository, see <<nifi-in-depth.adoc#flowfile-repository,NiFi In-Depth - FlowFile Repository>>.
+
+[WARNING]
+.Experimental
+============
+This implementation is marked <<experimental_warning, *experimental*>> as of Apache NiFi 1.11.0 (January 2020). The API, configuration, and internal behavior may change without warning, and such changes may occur during a minor release. Use at your own risk.
+============
+
+[WARNING]
+.Performance
+============
+The current implementation of the encrypted flowfile repository intercepts the serialization of flowfile record data via the `EncryptedSchemaRepositoryRecordSerde` and uses the `AES/GCM` algorithm, which is fairly performant on commodity hardware. This use of an authenticated encryption algorithm (AEAD) block cipher (because the content length is limited and known a priori) is the same as the <<encrypted-provenance,Encrypted Provenance Repository>>, but differs from the unauthenticated stream cipher used in the <<encrypted-content,Encrypted Content Repository>>. In low volume flowfile scenarios, the added cost will be minimal. However, administrators should perform their own risk assessment and performance analysis and decide how to move forward. Switching back and forth between encrypted/unencrypted implementations is not recommended at this time.
+============
+
+=== What is it?
+
+The `EncryptedSequentialAccessWriteAheadLog` is a new implementation of the flowfile write-ahead log which encrypts all flowfile attribute data before it is written to the repository. This allows for storage on systems where OS-level access controls are not sufficient to protect the data while still allowing querying and access to the data through the NiFi UI/API.
+
+=== How does it work?
+
+The `SequentialAccessWriteAheadLog` was introduced in NiFi 1.6.0 and provided a faster flowfile repository implementation. The encrypted version wraps that implementation with functionality to transparently encrypt and decrypt the serialized `RepositoryRecord` objects during file system interaction. During all writes to disk (swapping, snapshotting, journaling, and checkpointing), the flowfile containers are serialized to bytes based on a schema, and this serialized form is encrypted before writing. This allows the snapshot handler to continue interacting with the flowfile repository interface in the same way as before and continue operating on flowfile data in a random access manner, without requiring any changes to handle the data protection.
+
+The fully qualified class `org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog` is specified as the flowfile repository write-ahead log implementation in _nifi.properties_ as the value of `nifi.flowfile.repository.wal.implementation`. In addition, <<administration-guide.adoc#encrypted-write-ahead-flowfile-repository-properties,new properties>> must be populated to allow successful initialization.
+
+==== StaticKeyProvider
+The `StaticKeyProvider` implementation defines keys directly in _nifi.properties_. Individual keys are provided in hexadecimal encoding. The keys can also be encrypted like any other sensitive property in _nifi.properties_ using the <<administration-guide.adoc#encrypt-config_tool,`./encrypt-config.sh`>> tool in the NiFi Toolkit.
+
+The following configuration section would result in a key provider with two available keys, "Key1" (active) and "AnotherKey".
+....
+nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
+nifi.flowfile.repository.encryption.key.id=Key1
+nifi.flowfile.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
+nifi.flowfile.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
+....
+
+==== FileBasedKeyProvider
+The `FileBasedKeyProvider` implementation reads from an encrypted definition file of the format:
+
+....
+key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
+key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
+key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
+key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
+key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
+....
+
+Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the **master key** defined by `nifi.bootstrap.sensitive.key` in _conf/bootstrap.conf_.
+
+==== Key Rotation
+Simply update _nifi.properties_ to reference a new key ID in `nifi.flowfile.repository.encryption.key.id`. Previously-encrypted flowfile records can still be decrypted as long as that key is still available in the key definition file or `nifi.flowfile.repository.encryption.key.id.<OldKeyID>` as the key ID is serialized alongside the encrypted record.
+
+=== Writing and Reading FlowFiles
+Once the repository is initialized, all flowfile record write operations are serialized using `RepositoryObjectBlockEncryptor` (the only currently existing implementation is `RepositoryObjectAESGCMEncryptor`) to the provided `DataOutputStream`. The original stream is swapped with a temporary wrapped stream, which encrypts the data written by the wrapped serializer/deserializer via `EncryptedSchemaRepositoryRecordSerde` inline and the encryption metadata (`keyId`, `algorithm`, `version`, `IV`, `cipherByteLength`) is serialized and prepended. The complete length and encrypted bytes are then written to the original `DataOutputStream` on disk as normal.
+
+image:encrypted-flowfile-hex.png["Encrypted flowfile repository journal file on disk"]
+
+On flowfile record read, the process is reversed. The encryption metadata (`RepositoryObjectEncryptionMetadata`) is parsed and used to decrypt the serialized bytes, which are then deserialized into a `DataInputStream` object.
+
+During swaps and recoveries, the flowfile records are deserialized and reserialized, so if the active key has been changed, the flowfile records will be re-encrypted with the new active key.
+
+Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted flowfile repository. All framework interactions with flowfiles work as expected with no change to the process.
+
+=== Potential Issues
+
+[WARNING]
+.Switching Implementations
+============
+It is not recommended to switch between any implementation other than `SequentialAccessWriteAheadLog` and the `EncryptedSequentialAccessWriteAheadLog`. To migrate from a different provider, first migrate to the plaintext sequential log, allow NiFi to automatically recover the flowfiles, then stop NiFi and change the configuration to enable encryption. NiFi will automatically recover the plaintext flowfiles from the repository, and begin encrypting them on subsequent writes.
+============
+
+* Switching between unencrypted and encrypted repositories
+** If a user has an existing write-ahead repository (`WriteAheadFlowFileRepository`) that is not encrypted (uses the `SequentialAccessWriteAheadLog`) and switches their configuration to use an encrypted repository, the application handles this and all flowfile records will be recovered on startup. Future writes (including re-serialization of these same flowfiles) will be encrypted. If a user switches from an encrypted repository to an unencrypted repository, the flowfiles cannot be recovered, and it is recommended to delete the existing flowfile repository before switching in this direction. Automatic roll-over is a future effort (link:https://issues.apache.org/jira/browse/NIFI-6994[NIFI-6994^]) but NiFi is not intended for long-term storage of flowfile records so the impact should be minimal. There are two scenarios for roll-over:
+*** Encrypted -> unencrypted -- if the previous repository implementation was encrypted, these records should be handled seamlessly as long as the key provider available still has the keys used to encrypt the claims (see **Key Rotation**)
 
 Review comment:
   Done. Good call. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] markap14 commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
markap14 commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-573154939
 
 
   I'm a +1 as well. One of the great things about this implementation is that it requires relatively small changes to the existing codebase, mostly just adding features as new classes. Nicely done!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-572307305
 
 
   To verify that the flowfile repository files are encrypted, you can use any hex view tool (Hex Fiend, `xxd`,  etc.) to examine `$NIFI_HOME/flowfile_repository/checkpoint` and `$NIFI_HOME/flowfile_repository/journals/*.journal`. The beginning will be the serialization of the schema header, which is not sensitive and therefore not encrypted. After ~7300 bytes, you will find the beginning of the flowfile record serialization. In plaintext form, you would be able to read the attributes in plaintext. In encrypted form, you will see the Java serialization of the `RepositoryObjectEncryptionMetadata` class, containing `cipherByteLength`, `algorithm`, `ivBytes`, `version`, and `keyId`. Following those field names, you should see recognizable sequences like `K1` and `AES/GCM/NoPadding`. See example below. 
   
   <img width="1381" alt="Example encrypted journal file" src="https://user-images.githubusercontent.com/798465/72024771-065d9400-322b-11ea-8412-099c83b6f7f2.png">
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on issue #3968: NIFI-3383 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on issue #3968: NIFI-3383 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-572299054
 
 
   As mentioned, most of the actual logic is contained in the `EncryptedSchemaRepositoryRecordSerde` class which is a *ser*ializer/*de*serializer for Java objects (the representation of a flowfile aka `RepositoryRecord`) to the repository on the file system (`byte[]` output to a `journal` or `checkpoint` file). By intercepting the read/write process, delegating the normal schema-based serialization from object to bytes to the existing serde, and then encrypting/decrypting those bytes, we maintain all the benefits of the existing implementation, introduce very little change, and allow for future enhancement & flexibility. 
   
   Any methods I added should have clear Javadoc explaining their purpose and usage, and should be covered by unit tests. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] joewitt commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
joewitt commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#discussion_r365371051
 
 

 ##########
 File path: nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/WriteAheadFlowFileRepository.java
 ##########
 @@ -200,17 +211,19 @@ protected void initialize(final ResourceClaimManager claimManager, final Reposit
         // delete backup. On restore, if no files exist in partition's directory, would have to check backup directory
         this.serdeFactory = serdeFactory;
 
-        if (walImplementation.equals(SEQUENTIAL_ACCESS_WAL)) {
+        // The specified implementation can be plaintext or encrypted; the only difference is the serde factory
+        if (isSequentialAccessWAL(walImplementation)) {
+            // TODO: May need to instantiate ESAWAL for clarity?
 
 Review comment:
   This todo seems not necessary.  Given the todos related to follow on work with NIFI-6617 though this seems fine for now to help get more evaluation cycles on this new capability.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] andrewmlim commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
andrewmlim commented on a change in pull request #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#discussion_r364839986
 
 

 ##########
 File path: nifi-docs/src/main/asciidoc/user-guide.adoc
 ##########
 @@ -2773,6 +2773,86 @@ When switching between implementation "families" (i.e. `VolatileContentRepositor
 * Multiple repositories -- No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository).
 * Corruption -- when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual claims (i.e. an entire repository file won't be irrecoverable due to the encryption). Some testing has been performed on scenarios where disk space is exhausted. While the flow can no longer write additional content claims to the repository in that case, the NiFi application continues to function properly, and successfully written content claims are still available via the Provenance Query operations. Stopping NiFi and removing the content repository (or moving it to a larger disk) resolves the issue.
 
+[[encrypted-flowfile]]
+== Encrypted FlowFile Repository
+While OS-level access control can offer some security over the flowfile attribute and content claim data written to the disk in a repository, there are scenarios where the data may be sensitive, compliance and regulatory requirements exist, or NiFi is running on hardware not under the direct control of the organization (cloud, etc.). In this case, the flowfile repository allows for all data to be encrypted before being persisted to the disk. For more information on the internal workings of the flowfile repository, see <<nifi-in-depth.adoc#flowfile-repository,NiFi In-Depth - FlowFile Repository>>.
+
+[WARNING]
+.Experimental
+============
+This implementation is marked <<experimental_warning, *experimental*>> as of Apache NiFi 1.11.0 (January 2020). The API, configuration, and internal behavior may change without warning, and such changes may occur during a minor release. Use at your own risk.
+============
+
+[WARNING]
+.Performance
+============
+The current implementation of the encrypted flowfile repository intercepts the serialization of flowfile record data via the `EncryptedSchemaRepositoryRecordSerde` and uses the `AES/GCM` algorithm, which is fairly performant on commodity hardware. This use of an authenticated encryption algorithm (AEAD) block cipher (because the content length is limited and known a priori) is the same as the <<encrypted-provenance,Encrypted Provenance Repository>>, but differs from the unauthenticated stream cipher used in the <<encrypted-content,Encrypted Content Repository>>. In low volume flowfile scenarios, the added cost will be minimal. However, administrators should perform their own risk assessment and performance analysis and decide how to move forward. Switching back and forth between encrypted/unencrypted implementations is not recommended at this time.
+============
+
+=== What is it?
+
+The `EncryptedSequentialAccessWriteAheadLog` is a new implementation of the flowfile write-ahead log which encrypts all flowfile attribute data before it is written to the repository. This allows for storage on systems where OS-level access controls are not sufficient to protect the data while still allowing querying and access to the data through the NiFi UI/API.
+
+=== How does it work?
+
+The `SequentialAccessWriteAheadLog` was introduced in NiFi 1.6.0 and provided a faster flowfile repository implementation. The encrypted version wraps that implementation with functionality to transparently encrypt and decrypt the serialized `RepositoryRecord` objects during file system interaction. During all writes to disk (swapping, snapshotting, journaling, and checkpointing), the flowfile containers are serialized to bytes based on a schema, and this serialized form is encrypted before writing. This allows the snapshot handler to continue interacting with the flowfile repository interface in the same way as before and continue operating on flowfile data in a random access manner, without requiring any changes to handle the data protection.
+
+The fully qualified class `org.apache.nifi.wali.EncryptedSequentialAccessWriteAheadLog` is specified as the flowfile repository write-ahead log implementation in _nifi.properties_ as the value of `nifi.flowfile.repository.wal.implementation`. In addition, <<administration-guide.adoc#encrypted-write-ahead-flowfile-repository-properties,new properties>> must be populated to allow successful initialization.
+
+==== StaticKeyProvider
+The `StaticKeyProvider` implementation defines keys directly in _nifi.properties_. Individual keys are provided in hexadecimal encoding. The keys can also be encrypted like any other sensitive property in _nifi.properties_ using the <<administration-guide.adoc#encrypt-config_tool,`./encrypt-config.sh`>> tool in the NiFi Toolkit.
+
+The following configuration section would result in a key provider with two available keys, "Key1" (active) and "AnotherKey".
+....
+nifi.flowfile.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
+nifi.flowfile.repository.encryption.key.id=Key1
+nifi.flowfile.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
+nifi.flowfile.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
+....
+
+==== FileBasedKeyProvider
+The `FileBasedKeyProvider` implementation reads from an encrypted definition file of the format:
+
+....
+key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
+key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
+key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
+key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
+key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
+....
+
+Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the **master key** defined by `nifi.bootstrap.sensitive.key` in _conf/bootstrap.conf_.
+
+==== Key Rotation
+Simply update _nifi.properties_ to reference a new key ID in `nifi.flowfile.repository.encryption.key.id`. Previously-encrypted flowfile records can still be decrypted as long as that key is still available in the key definition file or `nifi.flowfile.repository.encryption.key.id.<OldKeyID>` as the key ID is serialized alongside the encrypted record.
+
+=== Writing and Reading FlowFiles
+Once the repository is initialized, all flowfile record write operations are serialized using `RepositoryObjectBlockEncryptor` (the only currently existing implementation is `RepositoryObjectAESGCMEncryptor`) to the provided `DataOutputStream`. The original stream is swapped with a temporary wrapped stream, which encrypts the data written by the wrapped serializer/deserializer via `EncryptedSchemaRepositoryRecordSerde` inline and the encryption metadata (`keyId`, `algorithm`, `version`, `IV`, `cipherByteLength`) is serialized and prepended. The complete length and encrypted bytes are then written to the original `DataOutputStream` on disk as normal.
+
+image:encrypted-flowfile-hex.png["Encrypted flowfile repository journal file on disk"]
+
+On flowfile record read, the process is reversed. The encryption metadata (`RepositoryObjectEncryptionMetadata`) is parsed and used to decrypt the serialized bytes, which are then deserialized into a `DataInputStream` object.
+
+During swaps and recoveries, the flowfile records are deserialized and reserialized, so if the active key has been changed, the flowfile records will be re-encrypted with the new active key.
+
+Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted flowfile repository. All framework interactions with flowfiles work as expected with no change to the process.
+
+=== Potential Issues
+
+[WARNING]
+.Switching Implementations
+============
+It is not recommended to switch between any implementation other than `SequentialAccessWriteAheadLog` and the `EncryptedSequentialAccessWriteAheadLog`. To migrate from a different provider, first migrate to the plaintext sequential log, allow NiFi to automatically recover the flowfiles, then stop NiFi and change the configuration to enable encryption. NiFi will automatically recover the plaintext flowfiles from the repository, and begin encrypting them on subsequent writes.
+============
+
+* Switching between unencrypted and encrypted repositories
+** If a user has an existing write-ahead repository (`WriteAheadFlowFileRepository`) that is not encrypted (uses the `SequentialAccessWriteAheadLog`) and switches their configuration to use an encrypted repository, the application handles this and all flowfile records will be recovered on startup. Future writes (including re-serialization of these same flowfiles) will be encrypted. If a user switches from an encrypted repository to an unencrypted repository, the flowfiles cannot be recovered, and it is recommended to delete the existing flowfile repository before switching in this direction. Automatic roll-over is a future effort (link:https://issues.apache.org/jira/browse/NIFI-6994[NIFI-6994^]) but NiFi is not intended for long-term storage of flowfile records so the impact should be minimal. There are two scenarios for roll-over:
+*** Encrypted -> unencrypted -- if the previous repository implementation was encrypted, these records should be handled seamlessly as long as the key provider available still has the keys used to encrypt the claims (see **Key Rotation**)
 
 Review comment:
   Minor suggestion: provide a link to the referenced "Key Rotation" section.  If done here, add the links to the same references in the encrypted provenance and content repo sections.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] andrewmlim commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
andrewmlim commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-572644565
 
 
   Completed my review of the doc changes/additions.  Consistent with the existing encrypted provenance and content repo sections in formatting and scope. Added two minor comments. Looks good!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] asfgit closed pull request #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on issue #3968: NIFI-3383 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on issue #3968: NIFI-3383 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-572297746
 
 
   I updated the Admin and User Guides with extensive discussion about what is happening here. I recommend reading those to get a quick understanding and overview. The bulk of the work is done in `NiFiProperties.java`, `EncryptedSchemaRepositoryRecordSerde.java`, `WriteAheadFlowFileRepository.java`, `EncryptedRepositoryRecordSerdeFactory.java`, and `RepositoryEncryptorUtils.java`. The vast majority of the other files are documentation, comment/formatting fixes, or unit tests. 
   
   I will describe:
   1. where to look for method usage (unit tests & Javadoc)
   1. the smoke tests I did on my system to verify the behavior
   1. example configurations to use to replicate locally
   1. how to verify the encryption is working (i.e. not writing/reading plaintext invisibly)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] joewitt commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
joewitt commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-573154475
 
 
   Walked through this with Mark Payne.  I'm +1.  Minor comments.  Love all the good docs here to help folks get going too.  Thanks ANdy!  Look forward to NIFI-6617 to ease config too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [nifi] alopresto commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository

Posted by GitBox <gi...@apache.org>.
alopresto commented on issue #3968: NIFI-3833 Implemented encrypted flowfile repository
URL: https://github.com/apache/nifi/pull/3968#issuecomment-573155914
 
 
   Thanks everyone for the reviews. My favorite commits are the ones that remove more code than they add. Adding just a little bit (and lots of tests) are my next favorite. Will merge. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services