You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2014/04/16 05:35:50 UTC

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/422

    [WIP][SPARK-1506][MLLIB] Documentation improvements for MLlib 1.0

    Still a work in process.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark mllib-doc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #422
    
----
commit ed1d493ecae06f1a2fd067a279163a9e1d1257b9
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-13T00:54:25Z

    update mllib's toc

commit d06511dbde8d85d8caa8294aff3bd86094f14923
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T03:26:33Z

    basic skeleton

commit 1537dd372a3e64251923792d9cc911dff12ed85f
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T05:32:24Z

    added placeholders and some doc

commit 3ecb2ad8a0a004a89debe5a5ce5b0d72181b9305
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T05:44:42Z

    minor text addition

commit b93125cce2d8a6a685aac0e7cda2357df0ddb09b
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T18:09:34Z

    more subsection reorg

commit 94fd2f9cf6aaf8c99f5ebc76d207a70abc82f832
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T18:38:42Z

    more reorg

commit 69252752172ec173bc9e705cf2e9194a83c46f9a
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T19:59:35Z

    impurity and information gain

commit 9c0c4be00ecf04ee42926ca11f99b07c1c1ab0c6
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T21:11:37Z

    split candidate

commit f427e84dabf9e1bdf95b65d258bedc5c85bb7577
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-13T23:42:13Z

    renaming sections

commit 6e297d7cc0b04c5586bf80c0be7b3de69c1a606a
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-14T00:26:43Z

    added subsections

commit b9ef6c4088429efeed8ddb63e70d6ee6516443df
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-14T01:14:38Z

    basic decision tree code examples

commit dbb0e5e4c3e5bfc14da1ad76e8a8f7ec2a2c0ac8
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-14T01:26:13Z

    minor improvements to text

commit 865826ee04da10260aeb1db72f1da13f730e678d
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-14T01:37:37Z

    minor: grammar

commit 022485ad965a75bd48aadd3852850fc2a0c9d5c6
Author: Manish Amde <ma...@gmail.com>
Date:   2014-04-14T05:40:22Z

    more documentation

commit a3523f972f091fdf6893f10d63394c317e0ee399
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T19:02:15Z

    Merge branch 'master' into mllib-doc

commit 8db54485d477bc645aeeaf36e8dd326fe4c7735b
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T21:51:57Z

    re-organize toc

commit 49811ad62c4d6354eb105e06b39e242dffeee7bf
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T21:52:19Z

    add guide for naive Bayes

commit b18ca56b6293e3b631d0052978d7f21843bbfc2b
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T21:52:49Z

    move linear-algebra to dimensionality-reduction

commit f59e35507882b84882a32dab832237caf7442ab8
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T21:55:15Z

    merge tree guide

commit 8e77fbce31f33439f2f0bab248a716e01ea52ff1
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T22:42:33Z

    add migration guide

commit c5cba7f423a3be0ca2980235d7004ccef8902e13
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-15T23:26:17Z

    update dependencies

commit bcbf5d233aa36463365989e085851a30acf4c4e8
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-16T02:50:24Z

    add first version of linear algebra guide

commit 34e9bb4503053b03361d914b289f6f05f45ba50e
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-16T02:53:45Z

    merge decision-tree changes

commit 7922fac943cb722d3df0fbc37e96ec1a433d9454
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-16T03:01:07Z

    move decision tree guide to a separate file

commit b4545c30f9505827e3c158aef2edd190ed6ea5b5
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-04-16T03:12:09Z

    one pass over tree guide

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40641417
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40665191
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40773276
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41083455
  
    Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40669891
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41083457
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40563034
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40665359
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14189/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40779607
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14221/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40565828
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14170/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40565829
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14171/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40559918
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40645174
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/422#discussion_r11842077
  
    --- Diff: docs/mllib-collaborative-filtering.md ---
    @@ -14,44 +14,43 @@ missing entries of a user-item association matrix.  MLlib currently supports
     model-based collaborative filtering, in which users and products are described
     by a small set of latent factors that can be used to predict missing entries.
     In particular, we implement the [alternating least squares
    -(ALS)](http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf)
    +(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
     algorithm to learn these latent factors. The implementation in MLlib has the
     following parameters:
     
    -* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure). 
    +* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
     * *rank* is the number of latent factors in our model.
     * *iterations* is the number of iterations to run.
     * *lambda* specifies the regularization parameter in ALS.
    -* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for *implicit feedback* data
    -* *alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the *baseline* confidence in preference observations
    +* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for
    --- End diff --
    
    These last two points lack periods, whereas every other point has periods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40686905
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40779604
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41013214
  
    @yinxusen Thanks for reviewing the doc and testing the examples! All your comments were addressed except
    
    > // Load training data in LIBSVM format.
    > val data = MLUtils.loadLibSVMData(sc, "mllib/data/sample_libsvm_data.txt")
    > It should be
    > "mllib/data/sample_svm_data.txt"
    
    I added `sample_libsvm_data.txt` in this PR.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/422#discussion_r11842061
  
    --- Diff: docs/mllib-collaborative-filtering.md ---
    @@ -14,44 +14,43 @@ missing entries of a user-item association matrix.  MLlib currently supports
     model-based collaborative filtering, in which users and products are described
     by a small set of latent factors that can be used to predict missing entries.
     In particular, we implement the [alternating least squares
    -(ALS)](http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf)
    +(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
     algorithm to learn these latent factors. The implementation in MLlib has the
     following parameters:
     
    -* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure). 
    +* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
    --- End diff --
    
    number of blacks -> number of blocks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40562833
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40681856
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41083461
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14329/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40643838
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14182/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/422#discussion_r11866538
  
    --- Diff: docs/mllib-guide.md ---
    @@ -3,63 +3,120 @@ layout: global
     title: Machine Learning Library (MLlib)
     ---
     
    +MLlib is a Spark implementation of some common machine learning algorithms and utilities,
    +including classification, regression, clustering, collaborative
    +filtering, dimensionality reduction, as well as underlying optimization primitives:
     
    -MLlib is a Spark implementation of some common machine learning (ML)
    -functionality, as well associated tests and data generators.  MLlib
    -currently supports four common types of machine learning problem settings,
    -namely classification, regression, clustering and collaborative filtering,
    -as well as an underlying gradient descent optimization primitive and several
    -linear algebra methods.
    -
    -# Available Methods
    -The following links provide a detailed explanation of the methods and usage examples for each of them:
    -
    -* <a href="mllib-classification-regression.html">Classification and Regression</a>
    -  * Binary Classification
    -    * SVM (L1 and L2 regularized)
    -    * Logistic Regression (L1 and L2 regularized)
    -  * Linear Regression
    -    * Least Squares
    -    * Lasso
    -    * Ridge Regression
    -  * Decision Tree (for classification and regression)
    -* <a href="mllib-clustering.html">Clustering</a>
    -  * k-Means
    -* <a href="mllib-collaborative-filtering.html">Collaborative Filtering</a>
    -  * Matrix Factorization using Alternating Least Squares
    -* <a href="mllib-optimization.html">Optimization</a>
    -  * Gradient Descent and Stochastic Gradient Descent
    -* <a href="mllib-linear-algebra.html">Linear Algebra</a>
    -  * Singular Value Decomposition
    -  * Principal Component Analysis
    -
    -# Data Types
    -
    -Most MLlib algorithms operate on RDDs containing vectors. In Java and Scala, the
    -[Vector](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) class is used to
    -represent vectors. You can create either dense or sparse vectors using the
    -[Vectors](api/mllib/index.html#org.apache.spark.mllib.linalg.Vectors$) factory.
    -
    -In Python, MLlib can take the following vector types:
    -
    -* [NumPy](http://www.numpy.org) arrays
    -* Standard Python lists (e.g. `[1, 2, 3]`)
    -* The MLlib [SparseVector](api/pyspark/pyspark.mllib.linalg.SparseVector-class.html) class
    -* [SciPy sparse matrices](http://docs.scipy.org/doc/scipy/reference/sparse.html)
    -
    -For efficiency, we recommend using NumPy arrays over lists, and using the
    -[CSC format](http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix)
    -for SciPy matrices, or MLlib's own SparseVector class.
    -
    -Several other simple data types are used throughout the library, e.g. the LabeledPoint
    -class ([Java/Scala](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint),
    -[Python](api/pyspark/pyspark.mllib.regression.LabeledPoint-class.html)) for labeled data.
    -
    -# Dependencies
    -MLlib uses the [jblas](https://github.com/mikiobraun/jblas) linear algebra library, which itself
    -depends on native Fortran routines. You may need to install the
    -[gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries)
    -if it is not already present on your nodes. MLlib will throw a linking error if it cannot
    -detect these libraries automatically.
    +* [Basics](mllib-basics.html)
    +  * data types 
    +  * summary statistics
    +* Classification and regression
    +  * [linear support vector machine (SVM)](mllib-linear-methods.html#linear-support-vector-machine-svm)
    +  * [logistic regression](mllib-linear-methods.html#logistic-regression)
    +  * [linear least squares, Lasso, and ridge regression](mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression)
    +  * [decision tree](mllib-decision-tree.html)
    +  * [naive Bayes](mllib-naive-bayes.html)
    +* [Collaborative filtering](mllib-collaborative-filtering.html)
    +  * alternating least squares (ALS)
    +* [Clustering](mllib-clustering.html)
    +  * k-means
    +* [Dimensionality reduction](mllib-dimensionality-reduction.html)
    +  * singular value decomposition (SVD)
    +  * principal component analysis (PCA)
    +* [Optimization](mllib-optimization.html)
    +  * stochastic gradient descent
    +  * limited-memory BFGS (L-BFGS)
    +
    +MLlib is currently a beta component under active development.
    +The APIs may be changed in the future releases, and we will provide migration guide between releases.
    +
    +## Dependencies
    +
    +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/), which depends on
    +[netlib-java](https://github.com/fommil/netlib-java), and
    +[jblas](https://github.com/mikiobraun/jblas).  `jblas` depend on native Fortran routines. You need
    +to install the
    +[gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries) if it is not
    +already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries
    +automatically.  Due to license issues, we do not include `netlib-java`'s native libraries in MLlib's
    +dependency set. If no native library is available at runtime, you will see a warning message.  To
    +use native libraries from `netlib-java`, please include artifact
    +`com.github.fommil.netlib:all:1.1.2` as a dependency of your project or build your own (see
    --- End diff --
    
    Mentioned both `netlib-java` and `jblas` need gfortran.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40673435
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40563041
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40669872
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40561515
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14166/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40773267
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40681854
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40640356
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40561514
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40686906
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14202/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40777873
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/422#discussion_r11841916
  
    --- Diff: docs/mllib-guide.md ---
    @@ -3,63 +3,120 @@ layout: global
     title: Machine Learning Library (MLlib)
     ---
     
    +MLlib is a Spark implementation of some common machine learning algorithms and utilities,
    +including classification, regression, clustering, collaborative
    +filtering, dimensionality reduction, as well as underlying optimization primitives:
     
    -MLlib is a Spark implementation of some common machine learning (ML)
    -functionality, as well associated tests and data generators.  MLlib
    -currently supports four common types of machine learning problem settings,
    -namely classification, regression, clustering and collaborative filtering,
    -as well as an underlying gradient descent optimization primitive and several
    -linear algebra methods.
    -
    -# Available Methods
    -The following links provide a detailed explanation of the methods and usage examples for each of them:
    -
    -* <a href="mllib-classification-regression.html">Classification and Regression</a>
    -  * Binary Classification
    -    * SVM (L1 and L2 regularized)
    -    * Logistic Regression (L1 and L2 regularized)
    -  * Linear Regression
    -    * Least Squares
    -    * Lasso
    -    * Ridge Regression
    -  * Decision Tree (for classification and regression)
    -* <a href="mllib-clustering.html">Clustering</a>
    -  * k-Means
    -* <a href="mllib-collaborative-filtering.html">Collaborative Filtering</a>
    -  * Matrix Factorization using Alternating Least Squares
    -* <a href="mllib-optimization.html">Optimization</a>
    -  * Gradient Descent and Stochastic Gradient Descent
    -* <a href="mllib-linear-algebra.html">Linear Algebra</a>
    -  * Singular Value Decomposition
    -  * Principal Component Analysis
    -
    -# Data Types
    -
    -Most MLlib algorithms operate on RDDs containing vectors. In Java and Scala, the
    -[Vector](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) class is used to
    -represent vectors. You can create either dense or sparse vectors using the
    -[Vectors](api/mllib/index.html#org.apache.spark.mllib.linalg.Vectors$) factory.
    -
    -In Python, MLlib can take the following vector types:
    -
    -* [NumPy](http://www.numpy.org) arrays
    -* Standard Python lists (e.g. `[1, 2, 3]`)
    -* The MLlib [SparseVector](api/pyspark/pyspark.mllib.linalg.SparseVector-class.html) class
    -* [SciPy sparse matrices](http://docs.scipy.org/doc/scipy/reference/sparse.html)
    -
    -For efficiency, we recommend using NumPy arrays over lists, and using the
    -[CSC format](http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix)
    -for SciPy matrices, or MLlib's own SparseVector class.
    -
    -Several other simple data types are used throughout the library, e.g. the LabeledPoint
    -class ([Java/Scala](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint),
    -[Python](api/pyspark/pyspark.mllib.regression.LabeledPoint-class.html)) for labeled data.
    -
    -# Dependencies
    -MLlib uses the [jblas](https://github.com/mikiobraun/jblas) linear algebra library, which itself
    -depends on native Fortran routines. You may need to install the
    -[gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries)
    -if it is not already present on your nodes. MLlib will throw a linking error if it cannot
    -detect these libraries automatically.
    +* [Basics](mllib-basics.html)
    +  * data types 
    +  * summary statistics
    +* Classification and regression
    +  * [linear support vector machine (SVM)](mllib-linear-methods.html#linear-support-vector-machine-svm)
    +  * [logistic regression](mllib-linear-methods.html#logistic-regression)
    +  * [linear least squares, Lasso, and ridge regression](mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression)
    +  * [decision tree](mllib-decision-tree.html)
    +  * [naive Bayes](mllib-naive-bayes.html)
    +* [Collaborative filtering](mllib-collaborative-filtering.html)
    +  * alternating least squares (ALS)
    +* [Clustering](mllib-clustering.html)
    +  * k-means
    +* [Dimensionality reduction](mllib-dimensionality-reduction.html)
    +  * singular value decomposition (SVD)
    +  * principal component analysis (PCA)
    +* [Optimization](mllib-optimization.html)
    +  * stochastic gradient descent
    +  * limited-memory BFGS (L-BFGS)
    +
    +MLlib is currently a beta component under active development.
    +The APIs may be changed in the future releases, and we will provide migration guide between releases.
    +
    +## Dependencies
    +
    +MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/), which depends on
    +[netlib-java](https://github.com/fommil/netlib-java), and
    +[jblas](https://github.com/mikiobraun/jblas).  `jblas` depend on native Fortran routines. You need
    +to install the
    +[gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries) if it is not
    +already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries
    +automatically.  Due to license issues, we do not include `netlib-java`'s native libraries in MLlib's
    +dependency set. If no native library is available at runtime, you will see a warning message.  To
    +use native libraries from `netlib-java`, please include artifact
    +`com.github.fommil.netlib:all:1.1.2` as a dependency of your project or build your own (see
    --- End diff --
    
    If users include `com.github.fommil.netlib:all:1.1.2` as a dependency, it will use fortran reference implementation which also requires gfortran runtime library. Maybe it worths to mention as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40777909
  
    I had to kill this build because it was hung up and blocking some other builds. It was hung up by a separate test issue unrelated to this patch... sorry, we can re-run it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40984061
  
    Several comments:
    
    ==========
    
    code here (scala code) http://54.82.240.23:4000/mllib-linear-methods.html#linear-support-vector-machine-svm
    
    
    `// Load training data in LIBSVM format.`
    `val data = MLUtils.loadLibSVMData(sc, "mllib/data/sample_libsvm_data.txt")`
    
    It should be
    
    `"mllib/data/sample_svm_data.txt"`
    
    and also, it cannot be parsed in libSVM format, you need write parsePoint just like the Python code.
    
    =========
    
    code here (python code) http://54.82.240.23:4000/mllib-naive-bayes.html
    
    `model = NaiveBayes.train(training, 1.0)`
    
    should be
    
    `model = NaiveBayes.train(data, 1.0)`
    
    and also has the bug: https://github.com/apache/spark/pull/463
    
    ============
    
    code here: http://54.82.240.23:4000/mllib-basics.html
    
    `import org.apache.spark.mllib.util.MLUtils`
    
    `val training: RDD[LabeledPoint] = MLUtils.loadLibSVMData(sc, "mllib/data/sample_libsvm_data.txt")`
    
    for consistency,it should be
    
    `import org.apache.spark.mllib.util.MLUtils`
    `import org.apache.spark.rdd.RDD`
    `import org.apache.spark.mllib.regression.LabeledPoint`
    
    `val training: RDD[LabeledPoint] = MLUtils.loadLibSVMData(sc, "mllib/data/sample_libsvm_data.txt")`
    
    and also, the path indicated there is not a valid path. See issue 1 as reference.
    
    ===============
    
    Other codes perform well under my test.
    
    Besides, the hyperlinks of
    * logistic regression
    * linear least squares, Lasso, and ridge regression
    
    are incorrect, they are forwarded to a localhost address. I think you need adjust the anchors.
    
    Anyway, great document!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40835078
  
    Thanks a lot! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40562829
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40777876
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14215/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41083462
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14331/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41020142
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14326/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40686086
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40640373
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40682753
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41072539
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41072782
  
    @pwendell Maybe we should merge this PR first for RC1 and then fix minor issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40565826
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41074978
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40682760
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40777424
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41072551
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/422


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41020139
  
    Build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41013188
  
    Build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40643837
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41075365
  
    Thanks, merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40777878
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40800628
  
    I had a glance over all of the material in the preview and it looks strong. Go to press!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40788273
  
    @pwendell I really want to disable Jenkins for this PR but I didn't find how to do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41013181
  
     Build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40794929
  
    @srowen If you have time, could you please help review the guide? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-41074991
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40665358
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40985126
  
    Append 2 unsolved problems:
    
    Code here (python code): http://54.82.240.23:4000/mllib-clustering.html
    
    `clusters = KMeans.train(parsedData, 2, maxIterations=10,
            runs=10, initialization_mode="random")`
    
    which should be
    
    `initializationMode` other than `initialization_mode`.
    
    =============
    
    code here (scala code) : http://54.82.240.23:4000/mllib-collaborative-filtering.html
    
    `val model = ALS.trainImplicit(ratings, rank, numIterations, 0.01)`
    
    need to either remove `0.01` or add a parameter `alpha`
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40641428
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40686088
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14200/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40665177
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40565825
  
    Build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40645175
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14183/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1506][MLLIB] Documentation improvements...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/422#discussion_r11866542
  
    --- Diff: docs/mllib-collaborative-filtering.md ---
    @@ -14,44 +14,43 @@ missing entries of a user-item association matrix.  MLlib currently supports
     model-based collaborative filtering, in which users and products are described
     by a small set of latent factors that can be used to predict missing entries.
     In particular, we implement the [alternating least squares
    -(ALS)](http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf)
    +(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
     algorithm to learn these latent factors. The implementation in MLlib has the
     following parameters:
     
    -* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure). 
    +* *numBlocks* is the number of blacks used to parallelize computation (set to -1 to auto-configure).
     * *rank* is the number of latent factors in our model.
     * *iterations* is the number of iterations to run.
     * *lambda* specifies the regularization parameter in ALS.
    -* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for *implicit feedback* data
    -* *alpha* is a parameter applicable to the implicit feedback variant of ALS that governs the *baseline* confidence in preference observations
    +* *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant or one adapted for
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40673440
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14195/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP][SPARK-1506][MLLIB] Documentation improve...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/422#issuecomment-40559925
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---