You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MrBago <gi...@git.apache.org> on 2017/03/25 00:10:58 UTC

[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

GitHub user MrBago opened a pull request:

    https://github.com/apache/spark/pull/17421

    [SPARK-20040][ML][python] pyspark wrapper for ChiSquareTest

    ## What changes were proposed in this pull request?
    
    A pyspark wrapper for spark.ml.stat.ChiSquareTest
    
    ## How was this patch tested?
    
    unit tests
    doctests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MrBago/spark chiSquareTestWrapper

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17421
    
----
commit a6bc10c9aa9166e7274d9c9ca3959a15b70e87ec
Author: Bago Amirbekian <ba...@databricks.com>
Date:   2017-03-24T23:58:21Z

    Added pyspark wrapper for ChiSquareTest and associated tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75192/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108535695
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    OK, no problem, I just wanted to check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #3617 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3617/testReport)** for PR 17421 at commit [`3e7163c`](https://github.com/apache/spark/commit/3e7163c0e4a375e392dbbe4f3f4e5b76700195c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75330/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75280/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75198/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Just remembered: you'll also need to update python/docs/pyspark.ml.rst for doc gen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #3617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3617/testReport)** for PR 17421 at commit [`3e7163c`](https://github.com/apache/spark/commit/3e7163c0e4a375e392dbbe4f3f4e5b76700195c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108022929
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,87 @@
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    --- End diff --
    
    Mark as Experimental  (Search for other examples to see how this is marked)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108286819
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,104 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    +    """
    +    .. note:: Experimental
    +
    +    Conduct Pearson's independence test for every feature against the label. For each feature,
    +    the (feature, label) pairs are converted into a contingency matrix for which the Chi-squared
    +    statistic is computed. All label and feature values must be categorical.
    +
    +    The null hypothesis is that the occurrence of the outcomes is statistically independent.
    +
    +    :param dataset:
    +      DataFrame of categorical labels and categorical features.
    +      Real-valued features will be treated as categorical for each distinct value.
    +    :param featuresCol:
    +      Name of features column in dataset, of type `Vector` (`VectorUDT`).
    +    :param labelCol:
    +      Name of label column in dataset, of any numerical type.
    +    :return:
    +      DataFrame containing the test result for every feature against the label.
    +      This DataFrame will contain a single Row with the following fields:
    +      - `pValues: Vector`
    +      - `degreesOfFreedom: Array[Int]`
    +      - `statistics: Vector`
    +      Each of these fields has one value per feature.
    +
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> from pyspark.ml.stat import ChiSquareTest
    +    >>> dataset = [[0, Vectors.dense([0, 0, 1])],
    +    ...            [0, Vectors.dense([1, 0, 1])],
    +    ...            [1, Vectors.dense([2, 1, 1])],
    +    ...            [1, Vectors.dense([3, 1, 1])]]
    +    >>> dataset = spark.createDataFrame(dataset, ["label", "features"])
    +    >>> chiSqResult = ChiSquareTest.test(dataset, 'features', 'label')
    +    >>> chiSqResult.select("degreesOfFreedom").collect()[0]
    +    Row(degreesOfFreedom=[3, 1, 0])
    +
    +    .. versionadded:: 2.2.0
    +
    +    """
    +    @staticmethod
    +    @since("2.2.0")
    +    def test(dataset, featuresCol, labelCol):
    +        """
    +        Perform a Pearson's independence test using dataset.
    +        """
    +        sc = SparkContext._active_spark_context
    +        javaTestObj = _jvm().org.apache.spark.ml.stat.ChiSquareTest
    +        args = [_py2java(sc, arg) for arg in (dataset, featuresCol, labelCol)]
    +        return _java2py(sc, javaTestObj.test(*args))
    +
    +
    +if __name__ == "__main__":
    +    import doctest
    +    import pyspark.ml.stat
    +    from pyspark.sql import SparkSession
    +
    +    globs = pyspark.ml.stat.__dict__.copy()
    +    # The small batch size here ensures that we see multiple batches,
    +    # even in these small test examples:
    +    spark = SparkSession.builder \
    +        .master("local[2]") \
    +        .appName("ml.stat tests") \
    +        .getOrCreate()
    +    sc = spark.sparkContext
    +    globs['sc'] = sc
    +    globs['spark'] = spark
    +    import tempfile
    +
    +    temp_path = tempfile.mkdtemp()
    --- End diff --
    
    I don't think this test is using the temp path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #3612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3612/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75195/testReport)** for PR 17421 at commit [`37e187b`](https://github.com/apache/spark/commit/37e187b26bcb32a5d341ec96a9da8ac7196741ad).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108511847
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    Oh yah sorry, its anything which is a new sub-directory and when I was reading this PR yesterday I thought this was a new directory, but looking it today that isn't the case, sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75270/testReport)** for PR 17421 at commit [`e00fc49`](https://github.com/apache/spark/commit/e00fc494a9503e5b6fbdca4214322c6d9c34907f).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108299231
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    Sub-modules aren't automatically packaged so we do need to explicitly add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108026677
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,102 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    +    """ Conduct Pearson's independence test for every feature against the label. For each feature,
    +    the (feature, label) pairs are converted into a contingency matrix for which the Chi-squared
    +    statistic is computed. All label and feature values must be categorical.
    +
    +    The null hypothesis is that the occurrence of the outcomes is statistically independent.
    +
    +    :param dataset:
    --- End diff --
    
    Same for the return value text


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75198/testReport)** for PR 17421 at commit [`b71caef`](https://github.com/apache/spark/commit/b71caef69a21f9a9a515e42ed9cc045d058ef80c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108023069
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1692,6 +1692,23 @@ def test_new_java_array(self):
             self.assertEqual(_java2py(self.sc, java_array), [])
     
     
    +class ChiSquareTestTests(SparkSessionTestCase):
    +
    +    def test_ChiSquareTest(self):
    +        labels = [1, 2, 0]
    +        vectors = [_convert_to_vector([0, 1, 2]),
    +                   _convert_to_vector([1, 1, 1]),
    +                   _convert_to_vector([2, 1, 0])]
    +        data = zip(labels, vectors)
    --- End diff --
    
    It can also be nicer to write this in a per-row format, rather than zipping labels and vectors which are defined separately.  See other examples of createDataFrame in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75270/testReport)** for PR 17421 at commit [`e00fc49`](https://github.com/apache/spark/commit/e00fc494a9503e5b6fbdca4214322c6d9c34907f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108026186
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,104 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    +    """ Conduct Pearson's independence test for every feature against the label. For each feature,
    --- End diff --
    
    I just saw you changed this from the Scala doc b/c I left "RDD" there.  Would you mind correcting the Scala doc too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75269/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108026690
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1692,6 +1692,23 @@ def test_new_java_array(self):
             self.assertEqual(_java2py(self.sc, java_array), [])
     
     
    +class ChiSquareTestTests(SparkSessionTestCase):
    +
    +    def test_ChiSquareTest(self):
    +        labels = [1, 2, 0]
    +        vectors = [_convert_to_vector([0, 1, 2]),
    +                   _convert_to_vector([1, 1, 1]),
    +                   _convert_to_vector([2, 1, 0])]
    +        data = zip(labels, vectors)
    --- End diff --
    
    Same for the doc test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75278/testReport)** for PR 17421 at commit [`3e7163c`](https://github.com/apache/spark/commit/3e7163c0e4a375e392dbbe4f3f4e5b76700195c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75195/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108026673
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,102 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    +    """ Conduct Pearson's independence test for every feature against the label. For each feature,
    +    the (feature, label) pairs are converted into a contingency matrix for which the Chi-squared
    +    statistic is computed. All label and feature values must be categorical.
    +
    +    The null hypothesis is that the occurrence of the outcomes is statistically independent.
    +
    +    :param dataset:
    --- End diff --
    
    Copy param text from the Scala doc, unless there's a need to customize it for Python


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75280/testReport)** for PR 17421 at commit [`114baf0`](https://github.com/apache/spark/commit/114baf0b7e201fa4cc11d6b8410972c7d8e109a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75269/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75278/testReport)** for PR 17421 at commit [`3e7163c`](https://github.com/apache/spark/commit/3e7163c0e4a375e392dbbe4f3f4e5b76700195c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108283757
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    We just took it out in https://github.com/apache/spark/commit/314cf51ded52834cfbaacf58d3d05a220965ca2a , but since this is adding back in ml.stat we also need to update setup.py (you might need to update your branch from the latest master to see this).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75199/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75192/testReport)** for PR 17421 at commit [`a6bc10c`](https://github.com/apache/spark/commit/a6bc10c9aa9166e7274d9c9ca3959a15b70e87ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75269/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108299791
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    Thanks @jkbradley, I reverted setup.py.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75270/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108023008
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1692,6 +1692,23 @@ def test_new_java_array(self):
             self.assertEqual(_java2py(self.sc, java_array), [])
     
     
    +class ChiSquareTestTests(SparkSessionTestCase):
    +
    +    def test_ChiSquareTest(self):
    +        labels = [1, 2, 0]
    +        vectors = [_convert_to_vector([0, 1, 2]),
    --- End diff --
    
    Use DenseVector, not _convert_to_vector.  (use public APIs wherever possible)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108287406
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -41,9 +41,7 @@
     import tempfile
     import array as pyarray
     import numpy as np
    -from numpy import (
    -    abs, all, arange, array, array_equal, dot, exp, inf, mean, ones, random, tile, zeros)
    -from numpy import sum as array_sum
    +from numpy import abs, all, arange, array, array_equal, inf, ones, tile, zeros
    --- End diff --
    
    Thanks for cleaning up the numpy imports :) +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by MrBago <gi...@git.apache.org>.
Github user MrBago commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108286662
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    @holdenk thanks for catching that, should be fixed now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75281/testReport)** for PR 17421 at commit [`3e7163c`](https://github.com/apache/spark/commit/3e7163c0e4a375e392dbbe4f3f4e5b76700195c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75329/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75192/testReport)** for PR 17421 at commit [`a6bc10c`](https://github.com/apache/spark/commit/a6bc10c9aa9166e7274d9c9ca3959a15b70e87ec).
     * This patch **fails RAT tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class ChiSquareTest(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75329/testReport)** for PR 17421 at commit [`1ce5966`](https://github.com/apache/spark/commit/1ce59662c6170e142eac5e075b5497e135741039).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75329/testReport)** for PR 17421 at commit [`1ce5966`](https://github.com/apache/spark/commit/1ce59662c6170e142eac5e075b5497e135741039).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108296529
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    Wait, do we need to update setup.py?  This is creating a module, not a package, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75281/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108022935
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1692,6 +1692,23 @@ def test_new_java_array(self):
             self.assertEqual(_java2py(self.sc, java_array), [])
     
     
    +class ChiSquareTestTests(SparkSessionTestCase):
    +
    +    def test_ChiSquareTest(self):
    --- End diff --
    
    This is a little arbitrary, but to follow other examples, write this as: ```test_chisquaretest```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75280/testReport)** for PR 17421 at commit [`114baf0`](https://github.com/apache/spark/commit/114baf0b7e201fa4cc11d6b8410972c7d8e109a1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75330 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75330/testReport)** for PR 17421 at commit [`e79f968`](https://github.com/apache/spark/commit/e79f96866bd333e046a758f8615a364fb99b0e24).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108026117
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,87 @@
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    --- End diff --
    
    Also, we put the triple-quotes on their own line elsewhere in pyspark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75199/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75281/testReport)** for PR 17421 at commit [`3e7163c`](https://github.com/apache/spark/commit/3e7163c0e4a375e392dbbe4f3f4e5b76700195c2).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108022984
  
    --- Diff: python/pyspark/ml/stat.py ---
    @@ -0,0 +1,87 @@
    +from pyspark import since, SparkContext
    +from pyspark.ml.common import _java2py, _py2java
    +from pyspark.ml.wrapper import _jvm
    +
    +
    +class ChiSquareTest(object):
    --- End diff --
    
    Mark as Experimental  (Search for other example of this)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75195/testReport)** for PR 17421 at commit [`37e187b`](https://github.com/apache/spark/commit/37e187b26bcb32a5d341ec96a9da8ac7196741ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    RAT tests are for checking that the Apache license appears at the top of each file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75199/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75278/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #3612 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3612/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merging with master
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75330/testReport)** for PR 17421 at commit [`e79f968`](https://github.com/apache/spark/commit/e79f96866bd333e046a758f8615a364fb99b0e24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108485003
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -431,6 +431,7 @@ def __hash__(self):
             "pyspark.ml.linalg.__init__",
             "pyspark.ml.recommendation",
             "pyspark.ml.regression",
    +        "pyspark.ml.stat",
    --- End diff --
    
    @holdenk  If we need to add pyspark.ml.stat to setup.py, then why are we not adding the other analogous modules: pyspark.ml.{classification, clustering, regression,...}?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17421: [SPARK-20040][ML][python] pyspark wrapper for Chi...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17421#discussion_r108023140
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -1692,6 +1692,23 @@ def test_new_java_array(self):
             self.assertEqual(_java2py(self.sc, java_array), [])
     
     
    +class ChiSquareTestTests(SparkSessionTestCase):
    +
    +    def test_ChiSquareTest(self):
    +        labels = [1, 2, 0]
    +        vectors = [_convert_to_vector([0, 1, 2]),
    +                   _convert_to_vector([1, 1, 1]),
    +                   _convert_to_vector([2, 1, 0])]
    +        data = zip(labels, vectors)
    +        df = self.spark.createDataFrame(data, ['label', 'feat'])
    +        res = ChiSquareTest.test(df, 'feat', 'label')
    +        # pValues = res.select("pValues").collect())
    --- End diff --
    
    (Noting that this can be updated once the Spark SQL bug is fixed)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #3615 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3615/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #3615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3615/testReport)** for PR 17421 at commit [`32a0b0c`](https://github.com/apache/spark/commit/32a0b0c93338f08effb72059759a8baea514fa7c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17421: [SPARK-20040][ML][python] pyspark wrapper for ChiSquareT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17421
  
    **[Test build #75198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75198/testReport)** for PR 17421 at commit [`b71caef`](https://github.com/apache/spark/commit/b71caef69a21f9a9a515e42ed9cc045d058ef80c).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org