You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sethah <gi...@git.apache.org> on 2017/01/20 19:24:00 UTC

[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

GitHub user sethah opened a pull request:

    https://github.com/apache/spark/pull/16661

    [SPARK-19313][ML][MLLIB] GaussianMixture should limit the number of features

    ## What changes were proposed in this pull request?
    
    The following test will fail on current master
    
    ````scala
    test("gmm fails on high dimensional data") {
        val ctx = spark.sqlContext
        import ctx.implicits._
        val df = Seq(
          Vectors.sparse(GaussianMixture.MAX_NUM_FEATURES + 1, Array(0, 4), Array(3.0, 8.0)),
          Vectors.sparse(GaussianMixture.MAX_NUM_FEATURES + 1, Array(1, 5), Array(4.0, 9.0)))
          .map(Tuple1.apply).toDF("features")
        val gm = new GaussianMixture()
        intercept[IllegalArgumentException] {
          gm.fit(df)
        }
      }
    ````
    
    Instead, you'll get an `ArrayIndexOutOfBoundsException` or something similar for MLlib. That's because the covariance matrix allocates an array of `numFeatures * numFeatures`, and in this case we get integer overflow. While there is currently a warning that the algorithm does not perform well for high number of features, we should perform an appropriate check to communicate this limitation to users.
    
    This patch adds a `require(numFeatures < GaussianMixture.MAX_NUM_FEATURES)` check to ML and MLlib algorithms. For the feature limitation, we can limit it such that we do not get numerical overflow to something like `math.sqrt(Integer.MaxValue).toInt` (about 46k) which eliminates the cryptic error. However in, for example WLS, we need to collect an array on the order of `numFeatures * numFeatures` to the driver and we therefore limit to 4096 features. We may want to keep that convention here for consistency.
    
    ## How was this patch tested?
    Unit tests in ML and MLlib.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sethah/spark gmm_high_dim

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16661.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16661
    
----
commit b5ae5bde519c80a9584a8e6429a54b2474b9c3ac
Author: sethah <se...@gmail.com>
Date:   2017-01-20T17:13:39Z

    numFeatures check

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    **[Test build #71880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71880/testReport)** for PR 16661 at commit [`51a237b`](https://github.com/apache/spark/commit/51a237b001a2e9a9257346b061fbf25bd63dc820).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97352845
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
    @@ -272,6 +277,10 @@ class GaussianMixture private (
     }
     
     private[clustering] object GaussianMixture {
    +
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    it looks like the constant can be shared between the two clustering algorithms - I would recommend using the mllib one for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    **[Test build #71937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71937/testReport)** for PR 16661 at commit [`5672d13`](https://github.com/apache/spark/commit/5672d1345f661665f521fd1dd4410313ef3ab554).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97765299
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
    @@ -272,6 +277,10 @@ class GaussianMixture private (
     }
     
     private[clustering] object GaussianMixture {
    +
    +  /** Limit number of features such that numFeatures^2^ < Int.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = math.sqrt(Int.MaxValue).toInt
    --- End diff --
    
    I believe the limiting factor here is that we can't have an array of elements somewhere that has more than 2^31 - 1 elements. For a dense representation of a normal n x n matrix, that limits n to 46340. Here, however, the matrix is a symmetric Gramian matrix that needs n(n+1)/2 elements of storage, so 65535 works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97479326
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    We have to unpack the covariance matrix to a full covariance matrix before returning the model. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97428335
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala ---
    @@ -53,6 +53,19 @@ class GaussianMixtureSuite extends SparkFunSuite with MLlibTestSparkContext
         rDataset = rData.map(FeatureData).toDF()
       }
     
    +  test("gmm fails on high dimensional data") {
    +    val ctx = spark.sqlContext
    +    import ctx.implicits._
    --- End diff --
    
    Removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97570174
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    +1 @srowen It's better to use the real max. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97360929
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    This is like a `private static final` field in Java, and when used for constants, CONSTANT_CASE is normal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97700702
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
    @@ -272,6 +277,10 @@ class GaussianMixture private (
     }
     
     private[clustering] object GaussianMixture {
    +
    +  /** Limit number of features such that numFeatures^2^ < Int.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = math.sqrt(Int.MaxValue).toInt
    --- End diff --
    
    The number is not equal to that used in `computeCovariance()` in `mllib.linalg.distributed.RowMatrix`.
    https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L327
    Do the limits in `mllib.linalg.distributed.RowMatrix` need to be updated to this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97351095
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    shouldn't this be in upper camel case according to scala style?
    
    http://docs.scala-lang.org/style/naming-conventions.html



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    BTW, it maybe nice to add a `SymmetricMatrix` class, for symmetric matrice are widely used in computation of covariance/concurrence/etc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97353937
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/GaussianMixtureSuite.scala ---
    @@ -53,6 +53,19 @@ class GaussianMixtureSuite extends SparkFunSuite with MLlibTestSparkContext
         rDataset = rData.map(FeatureData).toDF()
       }
     
    +  test("gmm fails on high dimensional data") {
    +    val ctx = spark.sqlContext
    +    import ctx.implicits._
    --- End diff --
    
    is there a way to remove this import?  I'm not sure why you need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97530588
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    Is floor(sqrt(2^31-1)) = 46340 more accurate? or is there overhead that prevents this from being achievable? I know it's a corner case, but if 46000 is a number that's just "about" the real max, let's just use the real max.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97499446
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
    @@ -272,6 +277,10 @@ class GaussianMixture private (
     }
     
     private[clustering] object GaussianMixture {
    +
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    ultimately *long-term* the plan is to deprecate `mllib` so keeping it separate is preferable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    ping @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Thanks for the review @srowen and @imatiach-msft!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97478414
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    In https://github.com/apache/spark/pull/15413, the symmetry of covariance matrix is taken into account and only the upper triangular part is store. So this number seems to be 65535? (`math.sqrt(Int.MaxValue.toDouble * 2)`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    **[Test build #71735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71735/testReport)** for PR 16661 at commit [`2ed94de`](https://github.com/apache/spark/commit/2ed94de769d2b06e2a81e6cd320637118874047d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97428409
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
    @@ -272,6 +277,10 @@ class GaussianMixture private (
     }
     
     private[clustering] object GaussianMixture {
    +
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    I can see benefits either way, but I think leaving ML GMM to be completely independent of MLlib is slightly preferable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    @imatiach-msft Spark committers must push the changes. As long as at least one committer is aware of the changes there is probably nothing left to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71937/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    **[Test build #71937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71937/testReport)** for PR 16661 at commit [`5672d13`](https://github.com/apache/spark/commit/5672d1345f661665f521fd1dd4410313ef3ab554).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71735/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97434038
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala ---
    @@ -272,6 +277,10 @@ class GaussianMixture private (
     }
     
     private[clustering] object GaussianMixture {
    +
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    +  private[clustering] val MAX_NUM_FEATURES = 46000
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    **[Test build #71880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71880/testReport)** for PR 16661 at commit [`51a237b`](https://github.com/apache/spark/commit/51a237b001a2e9a9257346b061fbf25bd63dc820).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    **[Test build #71735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71735/testReport)** for PR 16661 at commit [`2ed94de`](https://github.com/apache/spark/commit/2ed94de769d2b06e2a81e6cd320637118874047d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16661#discussion_r97571098
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala ---
    @@ -486,6 +491,9 @@ class GaussianMixture @Since("2.0.0") (
     @Since("2.0.0")
     object GaussianMixture extends DefaultParamsReadable[GaussianMixture] {
     
    +  /** Limit number of features such that numFeatures^2^ < Integer.MaxValue */
    --- End diff --
    
    Nit: ```Integer.MaxValue``` is not a standard convention, it should be ```Int.MaxValue``` in Scala or ```Integer.MAX_VALUE``` in Java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71880/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Will review it tomorrow. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should l...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    I left few minor comments, looks good to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by imatiach-msft <gi...@git.apache.org>.
Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    LGTM, nice work!  Who has the permissions to push the changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16661
  
    Merged into master. Thanks for all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org