You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Olaf Flebbe <of...@oflebbe.de> on 2016/12/05 17:07:01 UTC

Re: spark compile: solution ?

Hi Jonathan,

Now I am absolutely shure that the zinc compiler is started by the compile job.

It did a while to figure this out:

The ./dev/make-distribution.sh script invoked by do-component-build is invoking build/mvn which downloads and uncoditionally starts a zinc server. It would have been a nice addition to my "Attacking a Big Data Developer" talk at ApacheCon to demonstrate how to trigger exploits while someone compiles spark ;-)

Anyway, in order to increase security, I will open a JIRA to remove the zinc server startup from the build/mvn script. And while I am at it: reintroducing the incremental compile to the build process again, which I switched off unnecessarily.

Olaf




> Am 28.11.2016 um 18:38 schrieb Jonathan Kelly <jo...@gmail.com>:
> 
> Olaf, are you absolutely sure this is what is happening though? According
> to http://davidb.github.io/scala-maven-plugin/example_incremental.html, the
> useZincServer property doesn't actually seem to *start* Zinc; it just
> causes the build to use it if it is available. This is how the Spark build
> has always worked previously.
> 
> If the Spark 2.x build is in fact starting Zinc by itself now, I didn't
> realize that was the case. I've still been starting Zinc prior to the
> Bigtop build manually in my own builds.
> 
> BTW, even though we are not doing incremental builds for Spark, this isn't
> the only reason to use Zinc; with Zinc running, the Spark build is cut in
> *half* (from ~40 mins to ~20 mins last I checked). However, if build time
> isn't an issue, of course it's fine to disable it, especially if it's
> causing problems in the Bigtop CI.
> 
> ~ Jonathan
> 
> On Thu, Nov 24, 2016 at 12:29 PM Konstantin Boudnik <co...@apache.org> wrote:
> 
>> +1 - let's get rid of this crap.
>> 
>> On Thu, Nov 24, 2016 at 06:03PM, Olaf Flebbe wrote:
>>> Hi,
>>> 
>>> I think I got a clue why spark compile fails in our CI, but this may not
>> explain why compile fails for Ganesh.
>>> 
>>> The compile job in CI is started with option
>>> 
>>> docker run --rm -v `pwd`:/ws --workdir /ws -e COMPONENTS=$COMPONENTS
>> --net=container:nexus  bigtop/slaves:trunk-$BUILD_ENVIRONMENTS \
>>> bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean configure-nexus
>> $COMPONENTS-pkg'
>>> 
>>> The "--net=container:nexus" is necessary to get access to the private
>> subnet of the nexus container to be used as a maven repository securely.
>>> 
>>> However, if a process within the compile job happens to open a server
>> port, this port may be accessible from any other container running on the
>> same machine as well, since they are shareing their network!
>>> 
>>> I just read about nailgun and I saw that the zinc server is started at
>> spark compile job unconditionally in the pom.xml. So the problem may be a
>> "cross docker compile" race condition: When the  docker container servicing
>> the zinc server stops, all other compile host will fail at once, without
>> any error message. Or it will confuse libraries in the same namespace.
>> There are sign of this errors in the CI logs, too.
>>> 
>>> Given that our build process is not an incremental compile, I vote for
>> disabling the zinc madness.
>>> (i.e. remove useZincServer in pom.xml of spark)
>>> 
>>> Other possible attempts is to give up caching of artifacts.
>>> 
>>> 
>>> Thoughts?
>>>   Olaf
>>> 
>> 


Re: spark compile: solution ?

Posted by Jonathan Kelly <jo...@gmail.com>.
Olaf,

Haha, yes, it's not very "nice" of the script to start zinc for you without
even giving you the option to disable it. :-) Thanks for investigating that
and for filing the JIRA to make it optional. As you said on
https://issues.apache.org/jira/browse/BIGTOP-1752 though, it would be good
for us to figure out how to fix the docker networking so that this can work
again, as it would speed up the Spark build considerably.

~ Jonathan

On Mon, Dec 5, 2016 at 9:07 AM Olaf Flebbe <of...@oflebbe.de> wrote:

> Hi Jonathan,
>
> Now I am absolutely shure that the zinc compiler is started by the compile
> job.
>
> It did a while to figure this out:
>
> The ./dev/make-distribution.sh script invoked by do-component-build is
> invoking build/mvn which downloads and uncoditionally starts a zinc server.
> It would have been a nice addition to my "Attacking a Big Data Developer"
> talk at ApacheCon to demonstrate how to trigger exploits while someone
> compiles spark ;-)
>
> Anyway, in order to increase security, I will open a JIRA to remove the
> zinc server startup from the build/mvn script. And while I am at it:
> reintroducing the incremental compile to the build process again, which I
> switched off unnecessarily.
>
> Olaf
>
>
>
>
> > Am 28.11.2016 um 18:38 schrieb Jonathan Kelly <jo...@gmail.com>:
> >
> > Olaf, are you absolutely sure this is what is happening though? According
> > to http://davidb.github.io/scala-maven-plugin/example_incremental.html,
> the
> > useZincServer property doesn't actually seem to *start* Zinc; it just
> > causes the build to use it if it is available. This is how the Spark
> build
> > has always worked previously.
> >
> > If the Spark 2.x build is in fact starting Zinc by itself now, I didn't
> > realize that was the case. I've still been starting Zinc prior to the
> > Bigtop build manually in my own builds.
> >
> > BTW, even though we are not doing incremental builds for Spark, this
> isn't
> > the only reason to use Zinc; with Zinc running, the Spark build is cut in
> > *half* (from ~40 mins to ~20 mins last I checked). However, if build time
> > isn't an issue, of course it's fine to disable it, especially if it's
> > causing problems in the Bigtop CI.
> >
> > ~ Jonathan
> >
> > On Thu, Nov 24, 2016 at 12:29 PM Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> >> +1 - let's get rid of this crap.
> >>
> >> On Thu, Nov 24, 2016 at 06:03PM, Olaf Flebbe wrote:
> >>> Hi,
> >>>
> >>> I think I got a clue why spark compile fails in our CI, but this may
> not
> >> explain why compile fails for Ganesh.
> >>>
> >>> The compile job in CI is started with option
> >>>
> >>> docker run --rm -v `pwd`:/ws --workdir /ws -e COMPONENTS=$COMPONENTS
> >> --net=container:nexus  bigtop/slaves:trunk-$BUILD_ENVIRONMENTS \
> >>> bash -c '. /etc/profile.d/bigtop.sh; ./gradlew allclean configure-nexus
> >> $COMPONENTS-pkg'
> >>>
> >>> The "--net=container:nexus" is necessary to get access to the private
> >> subnet of the nexus container to be used as a maven repository securely.
> >>>
> >>> However, if a process within the compile job happens to open a server
> >> port, this port may be accessible from any other container running on
> the
> >> same machine as well, since they are shareing their network!
> >>>
> >>> I just read about nailgun and I saw that the zinc server is started at
> >> spark compile job unconditionally in the pom.xml. So the problem may be
> a
> >> "cross docker compile" race condition: When the  docker container
> servicing
> >> the zinc server stops, all other compile host will fail at once, without
> >> any error message. Or it will confuse libraries in the same namespace.
> >> There are sign of this errors in the CI logs, too.
> >>>
> >>> Given that our build process is not an incremental compile, I vote for
> >> disabling the zinc madness.
> >>> (i.e. remove useZincServer in pom.xml of spark)
> >>>
> >>> Other possible attempts is to give up caching of artifacts.
> >>>
> >>>
> >>> Thoughts?
> >>>   Olaf
> >>>
> >>
>
>