You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by an...@thomsonreuters.com on 2015/08/25 11:17:57 UTC

Spark builds: allow user override of project version at buildtime

I've got an interesting challenge in building Spark. For various reasons we
do a few different builds of spark, typically with a few different profile
options (e.g. against different versions of Hadoop, some with/without Hive
etc.). We mirror the spark repo internally and have a buildserver that
builds and publishes different Spark versions to an artifactory server. The
problem is that the output of each build is published with the version that
is in the pom.xml file - a build of Spark @tags/v1.4.1 always comes out with
an artefact version of '1.4.1'. However, because we may have three different
Spark builds for 1.4.1, it'd be useful to be able to override this version
at build time, so that we can publish 1.4.1, 1.4.1-cdh5.3.3 and maybe
1.4.1-cdh5.3.3-hive as separate artifacts. 

My understanding of maven is that the /project/version value in the pom.xml
isn't overridable. At the moment, I've hacked around this by having a
pre-build task that rewrites the various pom files and adjust the version to
a string that's correct for that particular build. 

Would it be useful to instead populate the version from a maven property,
which could then be overridable on the CLI? Something like:

<project>
    <version>${spark.version}</version>
    <properties>
        <spark.version>1.4.1</version>
    </properties>
</project>

Then, if I wanted to do a build against a specific profile, I could also
pass in a -Dspark.version=1.4.1-custom-string and have the output artifacts
correctly named. The default behaviour should be the same. Child pom files
would need to reference ${spark.version} in their parent section I think.

Any objections to this?

Andrew

RE: Spark builds: allow user override of project version at buildtime

Posted by an...@thomsonreuters.com.
So, I actually tried this, and it built without problems, but publishing the artifacts to artifactory ended up with some strangeness in the child poms, where the property wasn’t resolved. This leads to issues pulling them into other projects of: “Could not find org.apache.spark:spark-parent_2.10:${spark.version}.”

There's conflicting information out on the web about whether this should or shouldn't work, and whether it is or isn't a good idea. Broad consensus is that this is actually a bit of a hack around Maven, so it's probably not something we should do.

I'll explore whether sbt is more flexible and does what's needed. 

Andrew

From: Michael Armbrust [mailto:michael@databricks.com] 
Sent: 26 August 2015 03:12
To: Marcelo Vanzin <va...@cloudera.com>
Cc: Rowson, Andrew G. (Financial&Risk) <an...@thomsonreuters.com>; dev@spark.apache.org
Subject: Re: Spark builds: allow user override of project version at buildtime

This isn't really answering the question, but for what it is worth, I manage several different branches of Spark and publish custom named versions regularly to an internal repository, and this is *much* easier with SBT than with maven.  You can actually link the Spark SBT build into an external SBT build and write commands that cross publish as needed.

For your case something as simple as build/sbt "set version in Global := '1.4.1-custom-string'" publish might do the trick.

On Tue, Aug 25, 2015 at 10:09 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
On Tue, Aug 25, 2015 at 2:17 AM,  <an...@thomsonreuters.com> wrote:
> Then, if I wanted to do a build against a specific profile, I could also
> pass in a -Dspark.version=1.4.1-custom-string and have the output artifacts
> correctly named. The default behaviour should be the same. Child pom files
> would need to reference ${spark.version} in their parent section I think.
>
> Any objections to this?

Have you tried it? My understanding is that no project does that
because it doesn't work. To resolve properties you need to read the
parent pom(s), and if there's a variable reference there, well, you
can't do it. Chicken & egg.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Spark builds: allow user override of project version at buildtime

Posted by Michael Armbrust <mi...@databricks.com>.
This isn't really answering the question, but for what it is worth, I
manage several different branches of Spark and publish custom named
versions regularly to an internal repository, and this is *much* easier
with SBT than with maven.  You can actually link the Spark SBT build into
an external SBT build and write commands that cross publish as needed.

For your case something as simple as build/sbt "set version in Global :=
'1.4.1-custom-string'" publish might do the trick.

On Tue, Aug 25, 2015 at 10:09 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Tue, Aug 25, 2015 at 2:17 AM,  <an...@thomsonreuters.com>
> wrote:
> > Then, if I wanted to do a build against a specific profile, I could also
> > pass in a -Dspark.version=1.4.1-custom-string and have the output
> artifacts
> > correctly named. The default behaviour should be the same. Child pom
> files
> > would need to reference ${spark.version} in their parent section I think.
> >
> > Any objections to this?
>
> Have you tried it? My understanding is that no project does that
> because it doesn't work. To resolve properties you need to read the
> parent pom(s), and if there's a variable reference there, well, you
> can't do it. Chicken & egg.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Spark builds: allow user override of project version at buildtime

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Tue, Aug 25, 2015 at 2:17 AM,  <an...@thomsonreuters.com> wrote:
> Then, if I wanted to do a build against a specific profile, I could also
> pass in a -Dspark.version=1.4.1-custom-string and have the output artifacts
> correctly named. The default behaviour should be the same. Child pom files
> would need to reference ${spark.version} in their parent section I think.
>
> Any objections to this?

Have you tried it? My understanding is that no project does that
because it doesn't work. To resolve properties you need to read the
parent pom(s), and if there's a variable reference there, well, you
can't do it. Chicken & egg.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org