You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/03/09 01:37:38 UTC
[jira] [Commented] (NIFI-3313) First deployment of NiFi can hang on VMs without sufficient entropy if using /dev/random

    [ https://issues.apache.org/jira/browse/NIFI-3313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902290#comment-15902290 ] 

ASF GitHub Bot commented on NIFI-3313:
--------------------------------------

GitHub user alopresto opened a pull request:

    https://github.com/apache/nifi/pull/1579

    NIFI-3313 Added explicit Java runtime argument to default bootstrap.c…

    …onf to avoid blocking on VM deployment.
    
    This PR needs review in a specific environment. The reported issue is that NiFi running in a container or Virtual Machine environment that does not have access to sufficient entropy will block indeterminately on startup, right after the "Loaded *n* properties" message:
    
    ```
    2017-03-08 16:38:07,479 INFO [main] org.apache.nifi.NiFi Launching NiFi...
    2017-03-08 16:38:07,656 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be '/Users/alopresto/Workspace/nifi/nifi-assembly/target/nifi-1.2.0-SNAPSHOT-bin/nifi-1.2.0-SNAPSHOT/./conf/nifi.properties'
    2017-03-08 16:38:07,659 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 124 properties from /Users/alopresto/Workspace/nifi/nifi-assembly/target/nifi-1.2.0-SNAPSHOT-bin/nifi-1.2.0-SNAPSHOT/./conf/nifi.properties
    2017-03-08 16:38:07,665 INFO [main] org.apache.nifi.NiFi Loaded 124 properties
    ```
    
    I have added a Java runtime argument to `conf/bootstrap.conf` which directs Java to point the Entropy Generating Device (`java.security.egd`) to `/dev/urandom`. This is *not* a security concern because NiFi is *not* generating long-lived secrets at startup (many additional explanatory resources in NIFI-3313). 
    
    However, I cannot reproduce the original issue locally. I have tried running the application on my native OS (Mac OS X 10.11.6), in a Docker container (`aldrin/apache-nifi`) on the Boot2Docker ISO, and in a Docker container (`aldrin/apache-nifi`) on a new Ubuntu Xerial 16.04.2 LTS installation inside VirtualBox. In none of these environments could I successfully block NiFi from starting. 
    
    I request that whoever reviews this is someone who has encountered the blocking issue and can consistently reproduce it in order to ensure this change solves the problem. I have run the patched version on native OS (i.e. direct access to PRNG) and there were no ill effects. 
    
    <hr>
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/alopresto/nifi NIFI-3313

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1579.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1579
    
----
commit 654d616407cc7271d818b8902a17e9dafcafbb2f
Author: Andy LoPresto <al...@apache.org>
Date:   2017-03-09T00:44:49Z

    NIFI-3313 Added explicit Java runtime argument to default bootstrap.conf to avoid blocking on VM deployment.

----


> First deployment of NiFi can hang on VMs without sufficient entropy if using /dev/random
> ----------------------------------------------------------------------------------------
>
>                 Key: NIFI-3313
>                 URL: https://issues.apache.org/jira/browse/NIFI-3313
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.1.1
>            Reporter: Andy LoPresto
>            Assignee: Andy LoPresto
>            Priority: Critical
>              Labels: entropy, security, virtual-machine
>
> h1. Analysis of Issue
> h2. Statement of Problem:
> NiFi deployed on headless VM (little user interaction by way of keyboard and mouse I/O) can take 5-10 minutes (reported) to start up. User reports this occurs on a "secure" cluster. Further examination is required to determine which specific process requires the large amount of random input (no steps to reproduce, configuration files, logs, or VM environment information provided). 
> h2. Context
> The likely cause of this issue is that a process is attempting to read from _/dev/random_, a \*nix "device" providing a pseudo-random number generator (PRNG). Also available is _/dev/urandom_, a related PRNG. Despite common misperceptions, _/dev/urandom_ is not "less-secure" than _/dev/random_ for all general use cases. _/dev/random_ blocks if the entropy *estimate* (a "guess" of the existing entropy introduced into the pool) is lower than the amount of random data requested by the caller. In contrast, _/dev/urandom_ does not block, but provides the output of the same cryptographically-secure PRNG (CSPRNG) that _/dev/random_ reads from \[myths\]. After as little as 256 bytes of initial seeding, accessing _/dev/random_ and _/dev/urandom_ are functionally equivalent, as the long period of random data generated will not require re-seeding before sufficient entropy can be provided again. 
> As mentioned earlier, further examination is required to determine if the process requiring random input occurs at application boot or only at "machine" (hardware or VM) boot. On the first deployment of the system with certificates, the certificate generation process will require substantial random input. However, on application launch and connection to a cluster, even the TLS/SSL protocol requires some amount of random input. 
> h2. Proposed Solutions
> h3. rngd
> A software toolset for accessing dedicated hardware PRNG (*true* RNG, or TRNG) called _rng-tools_ \[rngtools\] exists for Linux. Specialized hardware, as well as Intel chips from IvyBridge and on (2012), can provide hardware-generated random input to the kernel. Using the daemon _rngd_ to seed the _/dev/random_ and _/dev/urandom_ entropy pool is the simplest solution. 
> *Note: Do not use _/dev/urandom_ to seed _/dev/random_ using _rngd_. This is like running a garden hose from a car's exhaust back into its gas tank and trying to drive.*
> h3. Instruct Java to use /dev/urandom
> The Java Runtime Environment (JRE) can be instructed to use _/dev/urandom_ for all invocations of {{SecureRandom}}, either on a per-Java process basis \[jdk-urandom\] or in the JVM configuration \[oracle-urandom\], which means it will not block on server startup. The NiFi {{bootstrap.conf}} file can be modified to contain an additional Java argument directing the JVM to use _/dev/urandom_. 
> h2. Other Solutions
> h3. Entropy Gathering Tools
> Tools to gather entropy from non-standard sources (audio card noise, video capture from webcams, etc.) have been developed such as audio-entropyd \[wagner\], but these tools are not verified or well-examined -- usually when tested, they are only tested for the strength of their PRNG, not the ability of the tool to capture entropy and generate sufficiently random data unavailable to an attacker who may be able to determine the internal state. 
> h3. haveged
> A solution has been proposed to use {{havaged}} \[haveged\], a user-space daemon relying on the HAVEGE (HArdware Volatile Entropy Gathering and Expansion) construct to continually increase the entropy on the system, allowing _/dev/random_ to run without blocking. 
> However, on further investigation, multiple sources indicate this solution may be insecure \[dice\]\[leek-havege\]. 
> Michael Kerrisk: 
> bq. Having read a number of papers about HAVEGE, Peter \[Anvin\] said he had been unable to work out whether this was a "real thing". Most of the papers that he has read run along the lines, "we took the output from HAVEGE, and ran some tests on it and all of the tests passed". The problem with this sort of reasoning is the point that Peter made earlier: there are no tests for randomness, only for non-randomness.
> bq. One of Peter's colleagues replaced the random input source employed by HAVEGE with a constant stream of ones. All of the same tests passed. In other words, all that the test results are guaranteeing is that the HAVEGE developers have built a very good PRNG. It is possible that HAVEGE does generate some amount of randomness, Peter said. But the problem is that the proposed source of randomness is simply too complex to analyze; thus it is not possible to make a definitive statement about whether it is truly producing randomness. (By contrast, the HWRNGs that Peter described earlier have been analyzed to produce a quantum theoretical justification that they are producing true randomness.) "So, while I can't really recommend it, I can't not recommend it either." If you are going to run HAVEGE, Peter strongly recommended running it together with rngd, rather than as a replacement for it.
> Tom Leek:
> bq. Of course, the whole premise of HAVEGE is questionable. For any practical security, you need a few "real random" bits, no more than 200, which you use as seed in a cryptographically secure PRNG. The PRNG will produce gigabytes of pseudo-\[data\] indistinguishable from true randomness, and that's good enough for all practical purposes.
> bq. Insisting on going back to the hardware for every bit looks like yet another outbreak of that flawed idea which sees entropy as a kind of gasoline, which you burn up when you look at it.
> h2. Next Steps
> As described above, further investigation is necessary, but moving forward, barring new information, I would propose directing the JVM to use _/dev/urandom_ and making _rngd_ available to systems that support a TRNG. 
> [myths] http://www.2uo.de/myths-about-urandom/
> [rngtools] https://git.kernel.org/cgit/utils/kernel/rng-tools/rng-tools.git/about/
> [jdk-urandom] http://stackoverflow.com/a/2325109/70465
> [oracle-urandom] https://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmrand.html
> [wagner] https://people.eecs.berkeley.edu/~daw/rnd/
> [haveged] http://www.issihosts.com/haveged/
> [dice] https://lwn.net/Articles/525459/
> [leek-havege] http://security.stackexchange.com/a/34552/16485



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)