You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Eric Yang <ey...@apache.org> on 2019/05/04 18:13:39 UTC

Re: [DISCUSS] Docker build process

See comments inline.

On Fri, Mar 22, 2019 at 2:06 AM Elek, Marton <el...@apache.org> wrote:

>
>
> Thanks the answer,
>
> I agree, sha256 based tags seems to be more safe and bumping versions
> only after some tests.
>
>
> Let's say we have multiple hadoop docker images:
>
> apache/hadoop:3.2.0
> apache/hadoop:3.1.2
> apache/hadoop:2.9.2
> apache/hadoop:2.8.5
> apache/hadoop:2.7.7
>
>
> If I understood well, your proposal is the following:
>
> In case of any security issue in centos/jdk, or in case of any bug in
> the apache/hadoop-runner base image (we have a few shell/python scripts
> there):
>
> 1) We need to wait until the next release to fix them (3.2.1) which
> means all the previous images would be unsecure / bad forever (but still
> available?)


Yes.  This prevents recursive 3.2.1.1.1 version forking to be maintained by
Apache.  Some company own internal decision might require them to use FROM
apache/hadoop:3.2.0 and applies their own internal patch.  Apache can phase
out deprecated versions, and old version can be found in archives.apache.org
.


>
>
OR
>
> 2) in case of a serious problem a new release can be created from all
> the lines (3.2.1, 3.1.3, 2.9.3, 2.8.6) with the help of all the release
> managers. (old images remain the same).
>

Release manager come and go, branches will eventually die off.  There is no
need to address super old images with unreachable release manager (maybe
retired).  The release only happens when there is demand for it.


> But on the other hand the image creation would be as easy as activating
> a new profile during the release. (As a contrast: Using separated repo a
> new branch would be created and the version in the Dockerfile would be
> adjusted).
>
> Marton
>
> ps: for the development (non published images) I am convinced that the
> optional docker profile can be an easier way to create images. Will
> create a similar plugin execution for this Dockerfile:
>
> https://github.com/apache/hadoop/tree/trunk/hadoop-ozone/dist
>
> On 3/21/19 11:33 PM, Eric Yang wrote:
> > The flexibility of date appended release number is equivalent to maven
> snapshot or Docker latest image convention, machine can apply timestamp
> better than human.  By using the Jenkins release process, this can be done
> with little effort.  For official release, it is best to use Docker image
> digest id to ensure uniqueness.  E.g.
> >
> > FROM centos@sha256:67dad89757a55bfdfabec8abd0e22f8c7c12a1856514726470228063ed86593b
>
> >
> > Developer downloaded released source would build with the same docker
> image without getting side effects.
> >
> > A couple years ago, RedHat has decided to fix SSL vulnerability in
> RedHat 6/7 by adding extra parameter to disable certification validation in
> urllib2 python library and force certificate signer validation on by
> default.  It completely broke Ambari agent and its self-signed
> certificate.  Customers had to backtrack to pick up a specific version of
> python SSL library to keep their production cluster operational.  Without
> doing the due-diligence of certify Hadoop code and the OS image, there is
> wriggle room for errors.  OS update example is a perfect example that we
> want the container OS image certified with Hadoop binary release to avoid
> the wriggle rooms.  Snapshot release is ok to have wriggle room for
> developers, but I don't think that flexibility is necessary for official
> release.
> >
> > Regards,
> > Eric
> >
> > On 3/21/19, 2:44 PM, "Elek, Marton" <el...@apache.org> wrote:
> >
> >
> >
> >     > If versioning is done correctly, older branches can have the same
> docker subproject, and Hadoop 2.7.8 can be released for older Hadoop
> branches.  We don't generate timeline paradox to allow changing the history
> of Hadoop 2.7.1.  That release has passed and let it stay that way.
> >
> >     I understand your point but I am afraid that my concerns were not
> >     expressed clearly enough (sorry for that).
> >
> >     Let's say that we use centos as the base image. In case of a security
> >     problem on the centos side (eg. in libssl) or jdk side, I would
> rebuild
> >     all the hadoop:2.x / hadoop:3.x images and republish them. Exactly
> the
> >     same hadoop bytes but updated centos/jdk libraries.
> >
> >     I understand your concerns that in this case the an image with the
> same
> >     tag (eg. hadoop:3.2.1) will be changed over the time. But this can be
> >     solved by adding date specific postfixes (eg. hadoop:3.2.1-20190321
> tag
> >     would never change but hadoop:3.2.1 can be changed)
> >
> >     I know that it's not perfect, but this is widely used. For example
> the
> >     centos:7 tag is not fixed but centos:7.6.1810 is (hopefully).
> >
> >     Without this flexibility any centos/jdk security issue can invalidate
> >     all of our images (and would require new releases from all the
> active lines)
> >
> >     Marton
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Docker build process

Posted by "Elek, Marton" <el...@apache.org>.

Thanks the answers Eric Yang, I think we have similar view about how the
releases are working and what you wrote is exactly the reason why I
prefer the current method (docker image creation from separated branch)
instead of the proposed one (create images from maven).

1. Not all the branches can be deprecated. Usually we have two or three
branches which have large user base. Can't deprecate all but the last one.

2. Yes, release managers of the old releases are may or may not be
available.

3. This is one reason to use 100% voted and approved packages inside
container images:

 * It makes it clean what's inside (hadoop version shows that it is
exactly the same bits which are voted and approved by PMC)

 * It makes possible to upgrade the convenience docker packaging (and
not hadoop!) of older but actively used releases (eg. 3.1 today). For
example in case of a serious ssl problem.

 * I prefer to keep container images for a few older versions. In Ozone
there are tests to test the compatibility between different hadoop
version. Docker containers (with older images) help a lot to test it.

Marton





>> 1) We need to wait until the next release to fix them (3.2.1) which
>> means all the previous images would be unsecure / bad forever (but still
>> available?)
> 
> 
> Yes.  This prevents recursive 3.2.1.1.1 version forking to be maintained by
> Apache.  Some company own internal decision might require them to use FROM
> apache/hadoop:3.2.0 and applies their own internal patch.  Apache can phase
> out deprecated versions, and old version can be found in archives.apache.org


>> 2) in case of a serious problem a new release can be created from all
>> the lines (3.2.1, 3.1.3, 2.9.3, 2.8.6) with the help of all the release
>> managers. (old images remain the same).
>>
> 
> Release manager come and go, branches will eventually die off.  There is no
> need to address super old images with unreachable release manager (maybe
> retired).  The release only happens when there is demand for it.



---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

Re: [DISCUSS] Docker build process

Posted by "Elek, Marton" <el...@apache.org>.

Thanks the answers Eric Yang, I think we have similar view about how the
releases are working and what you wrote is exactly the reason why I
prefer the current method (docker image creation from separated branch)
instead of the proposed one (create images from maven).

1. Not all the branches can be deprecated. Usually we have two or three
branches which have large user base. Can't deprecate all but the last one.

2. Yes, release managers of the old releases are may or may not be
available.

3. This is one reason to use 100% voted and approved packages inside
container images:

 * It makes it clean what's inside (hadoop version shows that it is
exactly the same bits which are voted and approved by PMC)

 * It makes possible to upgrade the convenience docker packaging (and
not hadoop!) of older but actively used releases (eg. 3.1 today). For
example in case of a serious ssl problem.

 * I prefer to keep container images for a few older versions. In Ozone
there are tests to test the compatibility between different hadoop
version. Docker containers (with older images) help a lot to test it.

Marton





>> 1) We need to wait until the next release to fix them (3.2.1) which
>> means all the previous images would be unsecure / bad forever (but still
>> available?)
> 
> 
> Yes.  This prevents recursive 3.2.1.1.1 version forking to be maintained by
> Apache.  Some company own internal decision might require them to use FROM
> apache/hadoop:3.2.0 and applies their own internal patch.  Apache can phase
> out deprecated versions, and old version can be found in archives.apache.org


>> 2) in case of a serious problem a new release can be created from all
>> the lines (3.2.1, 3.1.3, 2.9.3, 2.8.6) with the help of all the release
>> managers. (old images remain the same).
>>
> 
> Release manager come and go, branches will eventually die off.  There is no
> need to address super old images with unreachable release manager (maybe
> retired).  The release only happens when there is demand for it.



---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Docker build process

Posted by "Elek, Marton" <el...@apache.org>.

Thanks the answers Eric Yang, I think we have similar view about how the
releases are working and what you wrote is exactly the reason why I
prefer the current method (docker image creation from separated branch)
instead of the proposed one (create images from maven).

1. Not all the branches can be deprecated. Usually we have two or three
branches which have large user base. Can't deprecate all but the last one.

2. Yes, release managers of the old releases are may or may not be
available.

3. This is one reason to use 100% voted and approved packages inside
container images:

 * It makes it clean what's inside (hadoop version shows that it is
exactly the same bits which are voted and approved by PMC)

 * It makes possible to upgrade the convenience docker packaging (and
not hadoop!) of older but actively used releases (eg. 3.1 today). For
example in case of a serious ssl problem.

 * I prefer to keep container images for a few older versions. In Ozone
there are tests to test the compatibility between different hadoop
version. Docker containers (with older images) help a lot to test it.

Marton





>> 1) We need to wait until the next release to fix them (3.2.1) which
>> means all the previous images would be unsecure / bad forever (but still
>> available?)
> 
> 
> Yes.  This prevents recursive 3.2.1.1.1 version forking to be maintained by
> Apache.  Some company own internal decision might require them to use FROM
> apache/hadoop:3.2.0 and applies their own internal patch.  Apache can phase
> out deprecated versions, and old version can be found in archives.apache.org


>> 2) in case of a serious problem a new release can be created from all
>> the lines (3.2.1, 3.1.3, 2.9.3, 2.8.6) with the help of all the release
>> managers. (old images remain the same).
>>
> 
> Release manager come and go, branches will eventually die off.  There is no
> need to address super old images with unreachable release manager (maybe
> retired).  The release only happens when there is demand for it.



---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org