You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Shawn Weeks <sw...@weeksconsulting.us> on 2020/06/03 12:57:43 UTC

Docker Image Improvements for Kubernetes

I’m working on deploying NiFi to Kubernetes and I’ve ran across several things that could be improved.


  1.  Currently flow.xml.gz is stored in ./conf by default which has been designated a Docker volume. In Kubernetes volumes are not pre-populated from the image so I’m left with some init container magic to copy the contents of ./conf to another volume and then back again otherwise ./conf is empty. Since we’re configuring everything via environment variables anyway setting nifi.flow.configuration.file and designate a volume just for flow.xml.gz would solve that. You could even reuse your existing conf volume if you haven’t changed anything.
  2.  Expose more variables - NIFI-6232 already exists for this but hasn’t had any work.
  3.  Support OpenID Login Provider
  4.  Expose logs besides nifi-app.log

Re: Docker Image Improvements for Kubernetes

Posted by Pierre Villard <pi...@gmail.com>.

Hi,

That's really good feedback and I'm also doing similar things with my own
k8s/contained based deployments. Let's try to see if there are low-hanging
fruits that could be easily added in NiFi. Happy to look at / review pull
requests if there are things you'd like to see upstream.

Thanks,
Pierre

Le jeu. 4 juin 2020 à 13:53, Chris Sampson
<ch...@naimuri.com.invalid> a écrit :

> I've been using NiFi's Docker image for a while now and thought a few notes
> from the things we've done might be useful for your work:
>
>    - Using Docker Swarm (NiFi 1.9.2)
>       - Had to add some property file updates as part of a custom
>       Dockerfile build because the start.sh didn't cover them (some of
> these
>       might have already been addressed):
>          - nifi.cluster.protocol.is.secure needs to be set to true for
>          secure clusters
>          - allow for multiple NODE_IDENTITY entries to be specified in
>          authorizers.xml via environment variables (e.g. NODE_IDENTITY_1,
>          NODE_IDENTITY_2, etc.) - add as "Node Identity" and "Initial
> USer Identity"
>          elements
>          - allow configuration of ldap in authorizers.xml
>             - uncommenting sections of the file
>             - replacing element values/attributes with environment
> variables
>             - add User Group Providers (we had a composite of LDAP and File
>             based)
>          - update nifi.properties to set `nifi.security.identity.mapping`
>          related properties for LDAP <-> PKI mappings
>          - update nifi.properties to set appropriate `
>
>  nifi.web.http.network.interface`/`nifi.web.https.network.interface`
>          related entries that were found to be required to enable
> clustering,
>          site-to-site and external connections in our Swarm setup
> (hosted across
>          multiple AWS EC2s with two Swarm "networks" in play)
>
> Having been through some of the pain above, we later moved to a Kubernetes
> stack and re-implemented some of our approach. Once decision we made was to
> inject properties/configuration files instead of using the environment
> variable replacements via start.sh (because so many things we wanted
> weren't covered and we didn't want to continue trying to update the
> provided start.sh via sed/awk commands in our Dockerfile to add more
> commands as part of the container startup routine).
>
>    - Using Kubernetes (NiFi 1.11.4)
>       - custom Dockerfile that overrides the start.sh scripts to provide:
>          - overwrite of "static" config files injected into the k8s
>          StatefulSet (i.e. everything under conf/ that isn't generated
> at startup)
>             - we set non-dynamic & non-secure values in these files within
>             our git repo then inject them into the pod
>          - set dynamic properties, e.g. hostnames (for
>          `nifi.web.https.host`), similar to the provided start.sh script
> but a
>          different set or properties as what we need is different to
> what it provides
>          - create nifi-toolkit properties files (e.g. setting `baseUrl` and
>          `proxiedEntity`, etc. based on hostname & env vars)
>          - set secure properties (e.g. encryption.keys) that have provided
>          as files/env vars by k8s/STS
>          - add "Node Identity"/"Initial User Identity" entries based on the
>          k8s/STS setup (i.e. number of nodes in the cluster)
>          - setup "Initial Admin Identity" (based on env var)
>          - request node & initial admin certificates from a nifi-toolkit
>          instance (running in server mode) then configure them in
> nifi.properties &
>          nifi-toolkit properties
>          - create "common" keystore & truststore files in a known location
>          with a common password on each cluster node - this is
> required so we can
>          configure S2S reporting tasks with an SSL Controller Service
> (that can only
>          take a single file and password combination so has to be
> common across all
>          nodes)
>          - use nifi-toolkit to encrypt conf files (after they've been
>          updated)
>          - delete unwanted NARs from lib/
>          - download required extra (apache-nifi) NARs
>       - we have persisted volumes for
>          - some logs (that we don't output to STDOUT)
>          - persisted configuration, e.g. flow.xml.gz, users.xml,
>          authorisations.xml
>          - each of the repositories
>
> Retrospectively (things always look wrong when you look back, right? 😊),
> some of the stuff we've done with our custom startup scripts would have
> probably been better as init-containers (e.g. requesting certificates,
> dynamic config changes), but things that might be worth considering from a
> NiFi Docker point of view:
>
>    - cut-down image in terms of NARs with a way to inject/download extra
>    NARs as required at startup/as part of a custom build; but that said,
> the
>    current base is probably fine and anyone wanting to delete NARs should
> do
>    so with their own custom build, as we have
>    - providing a "base" set of config files but allowing for overrides
>    using files in a known directory; here I'm thinking mainly of things
> like
>    bootstrap.conf, where you could have a conf/conf.d/01-bootstrap.conf
> file
>    to provide extra JVM args, similar to Elasticsearch jvm.options.d
>    <
> https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html
> >
>    setup
>    - as you already mentioned, more property/config settings via
>    environment variables
>    - ability to change logging config (again could this be done with
>    additional files in a separate directory maybe?)
>
>
> *Chris Sampson*
> IT Consultant
> chris.sampson@naimuri.com
>
>
>
> On Wed, 3 Jun 2020 at 13:57, Shawn Weeks <sw...@weeksconsulting.us>
> wrote:
>
> > I’m working on deploying NiFi to Kubernetes and I’ve ran across several
> > things that could be improved.
> >
> >
> >   1.  Currently flow.xml.gz is stored in ./conf by default which has been
> > designated a Docker volume. In Kubernetes volumes are not pre-populated
> > from the image so I’m left with some init container magic to copy the
> > contents of ./conf to another volume and then back again otherwise ./conf
> > is empty. Since we’re configuring everything via environment variables
> > anyway setting nifi.flow.configuration.file and designate a volume just
> for
> > flow.xml.gz would solve that. You could even reuse your existing conf
> > volume if you haven’t changed anything.
> >   2.  Expose more variables - NIFI-6232 already exists for this but
> hasn’t
> > had any work.
> >   3.  Support OpenID Login Provider
> >   4.  Expose logs besides nifi-app.log
> >
> >
> >
>

Re: Docker Image Improvements for Kubernetes

Posted by Chris Sampson <ch...@naimuri.com.INVALID>.

I've been using NiFi's Docker image for a while now and thought a few notes
from the things we've done might be useful for your work:

   - Using Docker Swarm (NiFi 1.9.2)
      - Had to add some property file updates as part of a custom
      Dockerfile build because the start.sh didn't cover them (some of these
      might have already been addressed):
         - nifi.cluster.protocol.is.secure needs to be set to true for
         secure clusters
         - allow for multiple NODE_IDENTITY entries to be specified in
         authorizers.xml via environment variables (e.g. NODE_IDENTITY_1,
         NODE_IDENTITY_2, etc.) - add as "Node Identity" and "Initial
USer Identity"
         elements
         - allow configuration of ldap in authorizers.xml
            - uncommenting sections of the file
            - replacing element values/attributes with environment variables
            - add User Group Providers (we had a composite of LDAP and File
            based)
         - update nifi.properties to set `nifi.security.identity.mapping`
         related properties for LDAP <-> PKI mappings
         - update nifi.properties to set appropriate `
         nifi.web.http.network.interface`/`nifi.web.https.network.interface`
         related entries that were found to be required to enable clustering,
         site-to-site and external connections in our Swarm setup
(hosted across
         multiple AWS EC2s with two Swarm "networks" in play)

Having been through some of the pain above, we later moved to a Kubernetes
stack and re-implemented some of our approach. Once decision we made was to
inject properties/configuration files instead of using the environment
variable replacements via start.sh (because so many things we wanted
weren't covered and we didn't want to continue trying to update the
provided start.sh via sed/awk commands in our Dockerfile to add more
commands as part of the container startup routine).

   - Using Kubernetes (NiFi 1.11.4)
      - custom Dockerfile that overrides the start.sh scripts to provide:
         - overwrite of "static" config files injected into the k8s
         StatefulSet (i.e. everything under conf/ that isn't generated
at startup)
            - we set non-dynamic & non-secure values in these files within
            our git repo then inject them into the pod
         - set dynamic properties, e.g. hostnames (for
         `nifi.web.https.host`), similar to the provided start.sh script but a
         different set or properties as what we need is different to
what it provides
         - create nifi-toolkit properties files (e.g. setting `baseUrl` and
         `proxiedEntity`, etc. based on hostname & env vars)
         - set secure properties (e.g. encryption.keys) that have provided
         as files/env vars by k8s/STS
         - add "Node Identity"/"Initial User Identity" entries based on the
         k8s/STS setup (i.e. number of nodes in the cluster)
         - setup "Initial Admin Identity" (based on env var)
         - request node & initial admin certificates from a nifi-toolkit
         instance (running in server mode) then configure them in
nifi.properties &
         nifi-toolkit properties
         - create "common" keystore & truststore files in a known location
         with a common password on each cluster node - this is
required so we can
         configure S2S reporting tasks with an SSL Controller Service
(that can only
         take a single file and password combination so has to be
common across all
         nodes)
         - use nifi-toolkit to encrypt conf files (after they've been
         updated)
         - delete unwanted NARs from lib/
         - download required extra (apache-nifi) NARs
      - we have persisted volumes for
         - some logs (that we don't output to STDOUT)
         - persisted configuration, e.g. flow.xml.gz, users.xml,
         authorisations.xml
         - each of the repositories

Retrospectively (things always look wrong when you look back, right? 😊),
some of the stuff we've done with our custom startup scripts would have
probably been better as init-containers (e.g. requesting certificates,
dynamic config changes), but things that might be worth considering from a
NiFi Docker point of view:

   - cut-down image in terms of NARs with a way to inject/download extra
   NARs as required at startup/as part of a custom build; but that said, the
   current base is probably fine and anyone wanting to delete NARs should do
   so with their own custom build, as we have
   - providing a "base" set of config files but allowing for overrides
   using files in a known directory; here I'm thinking mainly of things like
   bootstrap.conf, where you could have a conf/conf.d/01-bootstrap.conf file
   to provide extra JVM args, similar to Elasticsearch jvm.options.d
   <https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html>
   setup
   - as you already mentioned, more property/config settings via
   environment variables
   - ability to change logging config (again could this be done with
   additional files in a separate directory maybe?)


*Chris Sampson*
IT Consultant
chris.sampson@naimuri.com



On Wed, 3 Jun 2020 at 13:57, Shawn Weeks <sw...@weeksconsulting.us> wrote:

> I’m working on deploying NiFi to Kubernetes and I’ve ran across several
> things that could be improved.
>
>
>   1.  Currently flow.xml.gz is stored in ./conf by default which has been
> designated a Docker volume. In Kubernetes volumes are not pre-populated
> from the image so I’m left with some init container magic to copy the
> contents of ./conf to another volume and then back again otherwise ./conf
> is empty. Since we’re configuring everything via environment variables
> anyway setting nifi.flow.configuration.file and designate a volume just for
> flow.xml.gz would solve that. You could even reuse your existing conf
> volume if you haven’t changed anything.
>   2.  Expose more variables - NIFI-6232 already exists for this but hasn’t
> had any work.
>   3.  Support OpenID Login Provider
>   4.  Expose logs besides nifi-app.log
>
>
>