You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2015/02/07 22:08:31 UTC

[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/4453

    SPARK-5669 [BUILD] Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS

    Exclude libgfortran, libgcc bundled by JBLAS for Windows. This much is simple, and solves the essential license issue. But the more important question is whether MLlib works on Windows then.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-5669

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4453.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4453
    
----
commit 35af84a67df76faebd837bce5670affe08ad0e19
Author: Sean Owen <so...@cloudera.com>
Date:   2015-02-07T21:07:38Z

    Exclude libgfortran, libgcc bundled by JBLAS for Windows

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-73386327
  
      [Test build #27013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27013/consoleFull) for   PR 4453 at commit [`35af84a`](https://github.com/apache/spark/commit/35af84a67df76faebd837bce5670affe08ad0e19).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74276104
  
    So, I think we have to commit this before another release, since I think we can't distribute these artifacts.
    
    I tried running Spark on Windows 8.1 + cygwin without these libs built in, and the good news is that it works. This is to be expected, given Mikio's notes here: https://github.com/mikiobraun/jblas/wiki/Missing-Libraries
    
    Without Cygwin, it may also work, but in the worst case, requires installing and putting on the library path a few more libraries, just as is required on Linux. That seems... like something the project could arguably accept in the short term.
    
    Now, the bad news is that libgfortran may be baked into the Linux 32-bit and OS X native code too. (Not Linux 64-bit). That's a significantly bigger problem, not least of which because the LGPL code can't be neatly excised.
    
    Mikio mentioned he may have some time to adjust the packaging to allow exclusion of the LGPL code. That may take some time. That would make end users install the LGPL libs themselves, which they already have to for Linux 64 bit.
    
    We had talked about standardizing on netlib, which also requires special deployment, but for that same reason, doesn't have this distribution issue.
    
    Maybe it's possible to put JBLAS behind a profile too so it's not built with native code support by default, but I am not yet sure it's possible.
    
    Any thoughts at this stage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74426293
  
    It won't work for branch-1.2 and branch-1.1 because ALS there uses the Cholesky solver from JBLAS, which is a native call. We need to replace that part by netlib-java's.
    
    I'm going to merge this into master and branch-1.3. So more people can help test it in RC1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4453


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74401700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27498/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-73647620
  
    /cc @pwendell @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74345731
  
    @srowen JBLAS includes statically linked libraries for OS X and Linux, which contains `gfortran` routines.
    
    ~~~
    $nm libjblas.jnilib
    00000000005471c0 t __gfortrani_open_internal
    ~~~
    
    Removing JBLAS completely from the runtime scope may require a day of work. Maybe we can go with this direction. But one issue is that `DoubleMatrix` gets exposed in the `SVDPlusPlus` public API. I don't know whether we should make API changes at this time. @ankurdave @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74400098
  
      [Test build #27498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27498/consoleFull) for   PR 4453 at commit [`734dd86`](https://github.com/apache/spark/commit/734dd86e5d818574a918fc7bd7015afe45c60da9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-73386329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27013/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-73383398
  
      [Test build #27013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27013/consoleFull) for   PR 4453 at commit [`35af84a`](https://github.com/apache/spark/commit/35af84a67df76faebd837bce5670affe08ad0e19).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74412266
  
    @mengxr I tried removing all but the Linux 64-bit native code (it does not package libgfortran actually, as we know from experience and https://github.com/mikiobraun/jblas/wiki/Missing-Libraries ). Good news is that unit tests still pass on OS X, and on Jenkins here. The tests do exercise the main usage of JBLAS which is in `NNLS` and the nonnegative case for ALS.
    
    I worried that it was picking up the libs from outside the assembly during tests, so I tried running some MLlib code that uses JBLAS (ALS.train + nonnegative = true) in the shell, after removing everything but the assembly jar and confirming it has none of these libraries inside. That succeeded.
    
    I see that lots of the calls we make don't actually hit the `NativeBlas` implementation anyway. But surely something still needs it? or else why did we ever have to install `libgfortran`? It would be great news if the native libs are actually unused. Off the top of your head, am I missing something?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74348269
  
    @srowen
    
    1. Could you try removing all libraries that pack `gfortran` from the jblas jar in assembly and test it on Win/Linux/Mac? If this works, then at least we don't have license issues with the Spark distribution.
    2. If it doesn't work, let's remove JBLAS completely from the runtime scope.
    3. Deprecate `SVDPlusPlus` in this release. If we go with 2), we can mark jblas as provided and do not include it in the Spark distribution. In the user guide, ask users to include them at runtime if they want to use `SVDPlusPlus`.
    
    Does it sound good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74425567
  
    @mengxr Great, so should be fine to remove the native libs. That's good, since we have to! This is true back to branch 1.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74425432
  
    From JBLAS (http://mikiobraun.github.io/jblas/javadoc/): 
    
    > jblas uses double and float arrays to store the matrix. Whenever you call a native function, the array is first copied. This means that it doesn’t make much sense to call a native routine if its computation is linear in the size of the data, but this includes most of BLAS Level 1 and Level 2. jblas therefore uses Java implementation for things like vector addition, or even matrix-vector multiplication and is therefore not as fast as native BLAS. Currently, I’m contemplating some caching schemes to improve performance here.
    
    So using Level 1 and Level 2 BLAS in JBLAS won't trigger native calls, we used to call JBLAS' Cholesky solver in ALS, which is a Level 3 LAPACK routine. But with the new implementation, we switched to netlib-java. So we don't actually use any JBLAS native calls in Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-5669 [BUILD] Spark assembly includes inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4453#issuecomment-74401698
  
      [Test build #27498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27498/consoleFull) for   PR 4453 at commit [`734dd86`](https://github.com/apache/spark/commit/734dd86e5d818574a918fc7bd7015afe45c60da9).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org