You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/21 00:36:04 UTC

[jira] [Commented] (NIFI-3594) Implement encrypted provenance repository

    [ https://issues.apache.org/jira/browse/NIFI-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977862#comment-15977862 ] 

ASF GitHub Bot commented on NIFI-3594:
--------------------------------------

GitHub user alopresto opened a pull request:

    https://github.com/apache/nifi/pull/1686

    NIFI-3594 Encrypted provenance repository implementation

    This is a big PR, and there is some helpful information before delving into the code. 
    
    # What is it?
    
    The `EncryptedWriteAheadProvenanceRepository` is a new implementation of the provenance repository which encrypts all event record information before it is written to the repository. This allows for storage on systems where OS-level access controls are not sufficient to protect the data while still allowing querying and access to the data through the NiFi UI/API. 
    
    # How does it work?
    
    The code will provide more details, and I plan to write extensive documentation for the Admin Guide and User Guide [NIFI-3721](https://issues.apache.org/jira/browse/NIFI-3721), but this will suffice for an overview. 
    
    The `WriteAheadProvenanceRepository` was introduced by @markap14 in [NIFI-3356](https://issues.apache.org/jira/browse/NIFI-3356) and provided a refactored and much faster provenance repository implementation than the previous `PersistentProvenanceRepository`. The encrypted version wraps that implementation with a record writer and reader which encrypt and decrypt the serialized bytes respectively. 
    
    The fully qualified class `org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository` is specified as the provenance repository implementation in `nifi.properties` as the value of `nifi.provenance.repository.implementation`. In addition, new properties must be populated to allow successful initialization. 
    
    The simplest configuration is below:
    
    ```
    nifi.provenance.repository.debug.frequency=100
    nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.provenance.StaticKeyProvider
    nifi.provenance.repository.encryption.key.provider.location=
    nifi.provenance.repository.encryption.key.id=Key1
    nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
    ```
    
    * `nifi.provenance.repository.debug.frequency` is a new configuration option to control the rate at which debug messages regarding performance statistics are printed to the logs (in *DEBUG* mode)
    * `nifi.provenance.repository.encryption.key.provider.implementation` is the *Key Provider* implementation. A key provider is the datastore interface for accessing the encryption key to protect the provenance events. There are currently two implementations -- `StaticKeyProvider` which reads a key directly from `nifi.properties`, and `FileBasedKeyProvider` which reads *n* many keys from an encrypted file. The interface is extensible, and HSM-backed or other providers are expected in the future. 
    * `nifi.provenance.repository.encryption.key.provider.location` is the location of the key provider data. For `StaticKeyProvider`, this is left blank. For `FileBasedKeyProvider`, this is a file path to the key provider definition file (e.g. `./keys.nkp`). For an HMS or other provider, this could be a URL, etc. 
    * `nifi.provenance.repository.encryption.key.id` is the *key ID* which is used to encrypt the events. 
    * `nifi.provenance.repository.encryption.key` is the hexadecimal encoding of the key for the `StaticKeyProvider`. For `FileBasedKeyProvider`, this value is left blank. This value can also be encrypted by using the `encrypt-config.sh` tool in the NiFi Toolkit, and is marked as sensitive by default. 
    
    The `FileBasedKeyProvider` implementation reads from an encrypted definition file of the format:
    
    ```
    key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
    key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
    key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
    key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
    key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==
    ```
    
    Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the **master key** defined by `nifi.bootstrap.sensitive.key` in `conf/bootstrap.conf`. 
    
    Once the repository is initialized, all provenance event record write operations are serialized according to the configured schema writer (`EventIdFirstSchemaRecordWriter` by default for `WriteAheadProvenanceRepository`) to a `byte[]`. Those bytes are then encrypted using an implementation of `ProvenanceEventEncryptor` (the only current implementation is `AES/GCM/NoPadding`) and the encryption metadata (`keyId`, `algorithm`, `version`, `IV`) is serialized and prepended. The complete `byte[]` is then written to the repository on disk as normal. 
    
    ![Encrypted provenance repository file on disk](https://i.imgur.com/BCpQoBl.png)
    
    On record read, the process is reversed. The encryption metadata is parsed and used to decrypt the serialized bytes, which are then deserialized into a `ProvenanceEventRecord` object. The delegation to the normal schema record writer/reader allows for "random-access" (i.e. immediate seek without decryption of unnecessary records). 
    
    Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted provenance repository. The Provenance Query operations work as expected with no change to the process. 
    
    # Performance
    
    While there is an obvious performance cost to cryptographic operations, I tried to minimize the impact and to provide an estimate of the metrics of this implementation in comparison to existing behavior. 
    
    In general, with low flow event volume, the performance impact is not noticeable -- it is perfectly inline with `WriteAheadProvenanceRepository` and more than twice as fast as the existing `PersistentProvenanceRepository`. 
    
    ![Small event size, low volume](http://i.imgur.com/kc35VJL.png)
    
    With a much higher volume of events, the impact is felt in two ways. First, the throughput of the flow is slower, as more resources are dedicated to encrypting and serializing the events (note the total events processed/events per second). In addition, the provenance queries are slightly slower than the original implementation (1% - 17%), and significantly slower than the new `WriteAheadProvenanceRepository` operating in plaintext (~110%). This is a known trade-off that will need to be evaluated by the deployment administrator given their threat model and risk assessment. 
    
    ![Small event size, high volume](https://i.imgur.com/M8jg75V.png)
    
    # Remaining Efforts
    * Documentation -- as noted above, this effort is captured in [NIFI-3721](https://issues.apache.org/jira/browse/NIFI-3721)
    * Logging data leakage -- in various places, I noted that with logs set to *DEBUG*, the `LuceneEventIndex` printed substantial information from the event record to the log. If the repository is encrypted, an administrator would reasonably expect this potentially-sensitive information not to be printed to the logs. In this specific instance, I changed the log statements to elide this information, but an audit needs to occur for the complete project to detect other instances where this may occur. Ideally, this could be variable depending on the encryption status of the repository, but this would require changing the method signature, and I didn't want to tackle that now 
    * Other implementations -- While AES/GCM is (in my opinion) the best option for event encryption (it is AEAD which provides confidentiality and integrity, very fast, and does not need to be compatible with any external system), users may have requirements/requests for other algorithms
    * Other key providers -- as noted above, HSM is probably the biggest, but other software-based secure data stores like [Vault](https://www.vaultproject.io/) or [KeyWhiz](https://square.github.io/keywhiz/), or JCEKS-backed to be compatible with Hadoop systems may be necessary
    * Refactoring shared code -- as part of the effort to provide encrypted repositories for content and flowfiles, some of this code will likely be moved to other modules
    
    # Potential Issues
    * Key rotation -- If a user wants to rotate the keys used, `StaticKeyProvider` does not provide a mechanism to support this. With `FileBasedKeyProvider`, they can simply specify a new key in the key provider file with `nifi.provenance.repository.encryption.key.id` in `nifi.properties` and future events will be encrypted with that key. Previously-encrypted events can still be decrypted as long as that key is still available in the key definition file
    * Switching between unencrypted and encrypted repositories
        - If a user has an existing repository that is not encrypted and switches their configuration to use an encrypted repository, the application writes an error to the log but starts up. However, previous events are not accessible through the provenance query interface and new events will overwrite the existing events. The same behavior occurs if a user switches from an encrypted repository to an unencrypted repository
        - We should provide logic to handle encrypted -> unencrypted seamlessly as long as the key provider available still has the keys used to encrypt the events (see **Key Rotation**)
        - We should provide logic to handle unencrypted -> encrypted seamlessly as the previously recorded events simply need to be read with a plaintext schema record reader and then written back with the encrypted record writer
        - We should also provide a standalone tool in NiFi Toolkit to encrypt/decrypt an existing provenance repository to make the transition easier. The translation process could take a long time depending on the size of the existing repository, and being able to perform this task outside of application startup would be valuable
    * Multiple repositories -- No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository). 
    * Corruption -- when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual records (i.e. an entire repository file won't be irrecoverable due to the encryption)
    * Shutdown -- I noticed that switching from `PersistentProvenanceRepository` to `EncryptedWriteAheadProvenanceRepository` led to slower NiFi app shutdowns [NIFI-3712](https://issues.apache.org/jira/browse/NIFI-3712). This was repeatable with `WriteAheadProvenanceRepository`, so I don't believe it is dependent on the encryption changes
    
    
    ----
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/alopresto/nifi NIFI-3594

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1686
    
----
commit d4de39b2505615e83a7c44262c95837a4bcdff48
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-14T04:53:00Z

    NIFI-3594 Added first unit test for PersistentProvenanceRepository operation.
    Added BC dependency to nifi-persistent-provenance-repository module.

commit 8006d12fa297229438accd617bbb59a9c86ea5f4
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-14T23:35:18Z

    NIFI-3594 Removed Thread.sleep() from WAPR unit test as per Mark Payne.

commit f6200032207f18fa83d2c4b1ac330f2649d1e7e6
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-17T03:49:28Z

    NIFI-3594 Fixed WAPR unit test (temporary fix dependent on NIFI-3605).

commit c497644e70e3e9bd766668b2df2d0f454e3d76a5
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-18T02:08:54Z

    NIFI-3594 Added skeleton of encrypted provenance repository (KeyProvider w/ 2 impls, Encryptor skeleton, and exceptions/utilities).
    Reorganized tests to proper path.

commit 2fdbb233e3332cc5c1205cbbb62b877914a8fc6c
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-22T03:36:27Z

    NIFI-3594 Added encryption methods and reflective property accessors. Pausing to re-evaluate because work may need to be done at lower level (EventWriter/EventReader -- byte/Object serialization).

commit 5951b78cc92356fa3f20798ba2bb1276b518cb91
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-24T21:52:13Z

    NIFI-3594 Intermediate changes before discussion with Mark Payne about intercepting SchemaRecordReader/Writer serialization (no updates to schema necessary).

commit 054cdefc0c8bee366670c9ea7d60bcc302a8808c
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-24T23:02:44Z

    NIFI-3594 Moved (Keyed)CipherProvider classes & tests into nifi-security-utils to include in nifi-data-provenance-utils.

commit 1dfbb5b83178b7e9d190d4c87bb277bb9f5eb6ba
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-27T23:44:38Z

    NIFI-3594 Working JUnit test with encrypted write and read of PER.

commit 2637a3049b5c6d10e6c42e6f83b7d049159cfca3
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-28T23:45:18Z

    NIFI-3594 Implemented encrypted read, write, and seek operations.
    Resolved RAT and checkstyle issues.
    All tests pass.

commit 33db1eeae8d48d854c467c91d8c520edce96ad3a
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-29T00:53:24Z

    NIFI-3594 Changed constant IV (for testing only) to actual random IV.

commit d844c0065062ad5bd181e9e077693d04cde6704d
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-29T00:55:23Z

    NIFI-3594 Delgated reader and writer to use AESKeyedCipherProvider (enhanced error checking and guard controls).

commit 7ae1a1c1d778d6c8b0ac25fc34b85b5c55c8ca7d
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-29T02:13:30Z

    NIFI-3594 Refactored to use concatByteArrays() for performance and heap optimization.

commit b912a140eb646f8969b5adb9c7d28b6894316e5d
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-06T00:25:37Z

    NIFI-3594 Working event encryptor lifecycle unit test with full encryption metadata serialization.

commit 83523ad4e66851da9facaee60ec456ff11ba2615
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-07T02:17:15Z

    NIFI-3594 Refactored AESProvenanceEventEncryptor implementation (removed cached ciphers to allow non-repeating IVs).
    Added unit tests.

commit 60f4f03b42737028f361621f3eea4c9d54a6971d
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-07T18:41:17Z

    NIFI-3594 Added forAlgorithm static constructor for EncryptionMethod.
    Added validity checks for algorithm and version in AESProvenanceEventEncryptor.
    Added unit tests.

commit b8314e8ce4e4033bf827f997336a73787fbeffc4
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-07T18:49:00Z

    NIFI-3594 Included bad keyId scenario in test.

commit 09115b2ad0a63fad8c3f6cdd62ba4d2aa02bc9b3
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-08T04:15:06Z

    NIFI-3594 Refactored key availability interface contract.
    Refactored encryptor composition.
    Added unit tests.

commit 10b203e187374462ba52f30b8d2c0213ca803f8f
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-11T23:37:47Z

    NIFI-3594 Began adding configuration properties for encrypted provenance repository.
    Added utility methods for validation.
    Added unit tests.

commit 711bfa4e91b394b2478a5bf5440e24cb0ac25b8d
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-19T00:14:39Z

    NIFI-3594 Added new NiFi properties keys for provenance repository encryption.
    Added nifi.provenance.repository.encryption.key to default sensitive keys and updated unit tests and test resources.
    Added method to correctly calculate protected percentage of sensitive keys (unpopulated keys are no longer counted against protection %).

commit 092cb2d13ad351761cf2bf3833ae9a01d602b6b7
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-19T01:21:09Z

    NIFI-3594 Implemented StaticKeyProvider and FileBasedKeyProvider.
    Moved getBestEventIdentifier() from StandardProvenanceEventRecord to ProvenanceEventRecord interface and added delegate in all implementations to avoid ClassCastException from multiple classloaders.
    Initialized IV before cipher to suppress unnecessary warnings.
    Added utility method to read encrypted provenance keys from key provider file.
    Suppressed logging of event record details in LuceneEventIndex.
    Added logic to create EncryptedSchemaRecordReader (if supported) in RecordReaders.
    Cleaned up EncryptedSchemaRecordReader and EncryptedSchemaRecordWriter.
    Added keyProvider, recordReaderFactory, and recordWriterFactory initialization to EncryptedWriteAheadProvenanceRepository to provide complete interceptor implementation.
    Added logic to RepositoryConfiguration to load encryption-related properties if necessary.
    Refactored WriteAheadProvenanceRepository to allow subclass implementation.
    Registered EncryptedWAPR in ProvenanceRepository implementations.
    Added unit tests for EWAPR.
    Added new nifi.properties keys for encrypted provenance repository.

commit 578d0d16097a60576bbb1682880b83040e551e45
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-20T22:31:12Z

    NIFI-3594 Cleanup of initial efforts (OBE).

commit b44b603b21a3a6b8e110f5d7673e7771aecb70e5
Author: Andy LoPresto <al...@apache.org>
Date:   2017-04-20T22:47:02Z

    NIFI-3594 Continued cleanup of initial efforts (OBE).

----


> Implement encrypted provenance repository
> -----------------------------------------
>
>                 Key: NIFI-3594
>                 URL: https://issues.apache.org/jira/browse/NIFI-3594
>             Project: Apache NiFi
>          Issue Type: Sub-task
>          Components: Core Framework
>    Affects Versions: 1.1.1
>            Reporter: Andy LoPresto
>            Assignee: Andy LoPresto
>              Labels: encryption, provenance, repository
>
> I am going to start with the provenance repository, as the new implementation of {{WriteAheadProvenanceRepository}} has the most recent design decisions and has not been available in a released version yet, so there should be minimal backward compatibility concerns. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)