You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@predictionio.apache.org by Ziemin <gi...@git.apache.org> on 2016/09/16 10:17:16 UTC

[GitHub] incubator-predictionio pull request #295: [PIO-30] Set up a cross build for ...

GitHub user Ziemin opened a pull request:

    https://github.com/apache/incubator-predictionio/pull/295

    [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala \u2026

    This PR introduces a simple profile-based build of PredictionIO, which comes along with some upgrades including a new version of Spark. 
    
    ### The key changes include:
    
    * build.sbt - here I created two profiles with different sets of artifacts to be included. Their names are _scala-2.11_  and _scala-2.10_, where the former is chosen by default. In order to set a desired profile for sbt command `-Dbuild.profile=<profile_name>` property has to be provided. 
    You can print the description of some profile by e.g. 
    ```sbt -Dbuild.profile=scala-2.10 printProfile```
    The _scala-2.11_ settings include Spark version 2.0.0, while _scala-2.11_ sets it to 1.6.2. This can be configured, by adding a dedicated property:
    ``` sbt -Dbuild.profile=scala-2.11 -Dspark.version=1.6.0 -D Dhadoop.version=2.6.4 <sbt task> ```
    This command will set a build profile to _scala-2.11_, but will use a different versions of Spark and Hadoop. It makes the configuration more flexible, especially if someone wants to build the project according to their own needs. 
    Very important thing to note is that versions of spark before 1.6.x are no longer supported and scala 2.10.x is deprecated. 
    
    * `data/src/main/spark-1/org/apache/predictionio/data/SparkVersionDependent.scala` and `data/src/main/spark-2/org/apache/predictionio/data/SparkVersionDependent.scala` - are the only examples of version dependent code. They are solely for providing a proper type of an object for Spark sql related actions. Sbt is configured to include version specific source paths like these.
    
    * make_distribution.sh - in order to create an archive for Scala 2.10 one has to provide it with an argument in the same way as sbt (./make_distribution.sh -Dbuild.profile=scala-2.10). By default it will build for _scala-2.11_.
    
    * integration tests - The docker image is updated, I pushed it with a tag spark_2.0.0 not to interfere with the current build. It contains both versions of Spark and on startup sets up environment according to dependencies that predictionIO was built with. It uses a simple Java program `tests/docker-files/BuildInfoPrinter.java` linked with the assembly of PredictionIO to acquire necessary information. Travis CI makes use of the setup and runs 8 parallel builds, the number doubled because of introducing two different build profiles.
    
    I also noticed that there are many places hardcoding different version numbers and links to packages to be downloaded. Maintaining the project and keeping everything consistent gets only more difficult, therefore I came up with a few small scripts getting proper versions of dependencies from the build config and setting some variables accordingly. An example is `conf/vendors.sh`, which provided that some variables are set (e.g. by `dev/set-build-profile.sh`) initializes some other useful variables, i.e. `SPARK_DOWNLOAD, SPARK_ARCHIVE, SPARK_DIRNAME`. They are used in travis configuration as well as in the Dockerfile, which now should be built with a dedicated script `tests/docker-build.sh`. Such approach makes it easier to keep everything coherent while bumping version numbers.
    
    ### Some problems encountered during upgrade
    Updating Spark caused some troubles
    
    * The classpath has to be extended to run the unit tests successfully for some sub-packages. (see `build.sbt`)
    * Column names have to be handled differently for Postgres in JDBCPevents, as Spark surrounds them with "..." what breaks the current schema in this case
    * `tests/pio_tests/utils.py` - has a special Spark pass through argument to set `spark.sql.warehouse.dir`, because the defaults cause runtime exceptions. See -> [here](https://mail-archives.apache.org/mod_mbox/spark-user/201608.mbox/%3CCAMAsSd+efZ+UscmnZVkfp00qbr9ynV8LRfHvz9LRMnwh2VK0Yw@mail.gmail.com%3E)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ziemin/incubator-predictionio upgrade

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-predictionio/pull/295.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #295
    
----
commit 39879f04639b123510f0668e0de7b33fd7418784
Author: Marcin Ziemi\u0144ski <zi...@gmail.com>
Date:   2016-08-09T21:20:54Z

    [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala 2.11
    (Spark 2.0.0).
    
    Changes also include updating travis integration tests, which run now
    eight parallel builds.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #295: [PIO-30] Set up a cross build for Scala 2...

Posted by Ziemin <gi...@git.apache.org>.

Github user Ziemin commented on the issue:

    https://github.com/apache/incubator-predictionio/pull/295
  
    I should also mention that I `install.sh` is not updated in this PR. I could use some advice on how to integrate it with the current build settings, which versions to choose not to make the whole process to convoluted for the users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #295: [PIO-30] Set up a cross build for ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-predictionio/pull/295


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---