You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/19 00:17:51 UTC

[GitHub] [spark] zero323 opened a new pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

zero323 opened a new pull request #30413:
URL: https://github.com/apache/spark/pull/30413


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This PR proposes migration of `pyspark.mllib` to NumPy documentation style.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   To improve documentation style.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes, this changes both rendered HTML docs and console representation (SPARK-33243).
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   `dev/lint-python` and manual inspection.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733328865


   __Side note__
   
   @HyukjinKwon 
   
   > Looking at [`numpy.ndarray.item`](https://numpy.org/devdocs/reference/generated/numpy.ndarray.item.html) we should make varargs explicit
   > 
   > ```
   > *cols : ...
   > ```
   
   This approach, especially with enumeration of possible types, looks rather useful. Shall we use consider using it, especially in cases, where just a list of types is a bit ambiguous? For example `struct`
   
   https://github.com/apache/spark/blob/95b6dabc33515f1975eb889480ccca12bf5ac3c8/python/pyspark/sql/functions.py#L1294-L1298


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529824599



##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -65,11 +65,19 @@ def colStats(rdd):
         """
         Computes column-wise summary statistics for the input RDD[Vector].
 
-        :param rdd: an RDD[Vector] for which column-wise summary statistics
-                    are to be computed.
-        :return: :class:`MultivariateStatisticalSummary` object containing
-                 column-wise summary statistics.
-
+        Parameters
+        ----------
+        rdd : :py:class:`pyspark.RDD`
+            an RDD[Vector] for which column-wise summary statistics

Review comment:
       Conveniently, that's syntax we use both for Scala and Python, with corresponding type hints looking like this:
   
   https://github.com/apache/spark/blob/048a9821c788b6796d52d1e2a0cd174377ebd0f0/python/pyspark/mllib/stat/_statistics.pyi#L44




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732991015






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733238511






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730789894


   **[Test build #131384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131384/testReport)** for PR 30413 at commit [`e86af30`](https://github.com/apache/spark/commit/e86af30641891e75a3fe68dd41c502d3c45d23d0).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733260141






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733564631


   > @zero323, the only reason I have chosen `cols` over `*cols` is that I felt odds to just document this such as `*cols : tuple`, and thought `` cols: str, :class:`Column` ... `` is clearer.
   
   That's true.. I am also concerned, how far can we go, without effectively duplicating annotations in a natural language.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733108177


   > BTW, @zero323 could you please provide some screenshots after this change
   
   Done, @zhengruifeng.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733367808


   **[Test build #131706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131706/testReport)** for PR 30413 at commit [`eca86aa`](https://github.com/apache/spark/commit/eca86aa9c490cc359643fb95a93d9fb61999e17f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733327260


   **[Test build #131706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131706/testReport)** for PR 30413 at commit [`eca86aa`](https://github.com/apache/spark/commit/eca86aa9c490cc359643fb95a93d9fb61999e17f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730041859


   **[Test build #131307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131307/testReport)** for PR 30413 at commit [`c777b79`](https://github.com/apache/spark/commit/c777b79cbe7ddf7fab9714b762604f42bc2cf043).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730816235


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529113536



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -501,53 +525,59 @@ def load(cls, sc, path):
 
 class SVMWithSGD(object):
     """
+    Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
+
     .. versionadded:: 0.9.0
     """
 
     @classmethod
-    @since('0.9.0')
     def train(cls, data, iterations=100, step=1.0, regParam=0.01,
               miniBatchFraction=1.0, initialWeights=None, regType="l2",
               intercept=False, validateData=True, convergenceTol=0.001):
         """
         Train a support vector machine on the given data.
 
-        :param data:
-          The training data, an RDD of LabeledPoint.
-        :param iterations:
-          The number of iterations.
-          (default: 100)
-        :param step:
-          The step parameter used in SGD.
-          (default: 1.0)
-        :param regParam:
-          The regularizer parameter.
-          (default: 0.01)
-        :param miniBatchFraction:
-          Fraction of data to be used for each SGD iteration.
-          (default: 1.0)
-        :param initialWeights:
-          The initial weights.
-          (default: None)
-        :param regType:
-          The type of regularizer used for training our model.
-          Allowed values:
+        .. versionadded:: 0.9.0
+
+        Parameters
+        ----------
+        data : :py:class:`pyspark.RDD`
+            The training data, an RDD of :py:class:`pyspark.mllib.regression.LabeledPoint`.
+        iterations : int, optional
+            The number of iterations.
+            (default: 100)
+        step : float, optional
+            The step parameter used in SGD.
+            (default: 1.0)
+        regParam : float, optional
+            The regularizer parameter.
+            (default: 0.01)
+        miniBatchFraction: float, optional

Review comment:
       miniBatchFraction: -> miniBatchFraction :




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733396499






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529776085



##########
File path: python/pyspark/mllib/util.py
##########
@@ -273,24 +317,30 @@ def convertVectorColumnsFromML(dataset, *cols):
         return callMLlibFunc("convertVectorColumnsFromML", dataset, list(cols))
 
     @staticmethod
-    @since("2.0.0")
     def convertMatrixColumnsToML(dataset, *cols):
         """
         Converts matrix columns in an input DataFrame from the
         :py:class:`pyspark.mllib.linalg.Matrix` type to the new
         :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
         package.
 
-        :param dataset:
-          input dataset
-        :param cols:
-          a list of matrix columns to be converted.
-          New matrix columns will be ignored. If unspecified, all old
-          matrix columns will be converted excepted nested ones.
-        :return:
-          the input dataset with old matrix columns converted to the
-          new matrix type
+        .. versionadded:: 2.0.0
 
+        dataset : :py:class:`pyspark.sql.DataFrame`
+            input dataset
+        cols : str

Review comment:
       Is it a str or list of str?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529730875



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -501,53 +525,59 @@ def load(cls, sc, path):
 
 class SVMWithSGD(object):
     """
+    Train a Support Vector Machine (SVM) using Stochastic Gradient Descent.
+
     .. versionadded:: 0.9.0
     """
 
     @classmethod
-    @since('0.9.0')
     def train(cls, data, iterations=100, step=1.0, regParam=0.01,
               miniBatchFraction=1.0, initialWeights=None, regType="l2",
               intercept=False, validateData=True, convergenceTol=0.001):
         """
         Train a support vector machine on the given data.
 
-        :param data:
-          The training data, an RDD of LabeledPoint.
-        :param iterations:
-          The number of iterations.
-          (default: 100)
-        :param step:
-          The step parameter used in SGD.
-          (default: 1.0)
-        :param regParam:
-          The regularizer parameter.
-          (default: 0.01)
-        :param miniBatchFraction:
-          Fraction of data to be used for each SGD iteration.
-          (default: 1.0)
-        :param initialWeights:
-          The initial weights.
-          (default: None)
-        :param regType:
-          The type of regularizer used for training our model.
-          Allowed values:
+        .. versionadded:: 0.9.0
+
+        Parameters
+        ----------
+        data : :py:class:`pyspark.RDD`
+            The training data, an RDD of :py:class:`pyspark.mllib.regression.LabeledPoint`.
+        iterations : int, optional
+            The number of iterations.
+            (default: 100)
+        step : float, optional
+            The step parameter used in SGD.
+            (default: 1.0)
+        regParam : float, optional
+            The regularizer parameter.
+            (default: 0.01)
+        miniBatchFraction: float, optional

Review comment:
       Thanks @viirya, fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730065212






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #30413:
URL: https://github.com/apache/spark/pull/30413


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r527328664



##########
File path: python/docs/source/reference/pyspark.mllib.rst
##########
@@ -216,6 +216,8 @@ Statistics
     ChiSqTestResult
     MultivariateGaussian
     KernelDensity
+    ChiSqTestResult
+    KolmogorovSmirnovTestResult

Review comment:
       These two are returned by public methods so I believe that it makes sense to have documentation entries.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730754436


   **[Test build #131384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131384/testReport)** for PR 30413 at commit [`e86af30`](https://github.com/apache/spark/commit/e86af30641891e75a3fe68dd41c502d3c45d23d0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r527329687



##########
File path: python/pyspark/mllib/stat/__init__.py
##########
@@ -21,8 +21,9 @@
 
 from pyspark.mllib.stat._statistics import Statistics, MultivariateStatisticalSummary
 from pyspark.mllib.stat.distribution import MultivariateGaussian
-from pyspark.mllib.stat.test import ChiSqTestResult
+from pyspark.mllib.stat.test import ChiSqTestResult, KolmogorovSmirnovTestResult
 from pyspark.mllib.stat.KernelDensity import KernelDensity
 
-__all__ = ["Statistics", "MultivariateStatisticalSummary", "ChiSqTestResult",
+__all__ = ["Statistics", "MultivariateStatisticalSummary",
+           "ChiSqTestResult", "KolmogorovSmirnovTestResult",

Review comment:
       These modifications where made, to add test documentation in the stats section. Additionally, it is rather inconsistent to export one test result class and not other.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 edited a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 edited a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732099229


   
   > LGTM if it results in better docs.
   > BTW, @zero323 could you please provide some screenshots after this change (like #30149)?
   > It may help reviewers to better understand this change.
   
   Some examples have been provided by @HyukjinKwon in the initial PR (https://github.com/apache/spark/pull/30149), but I'll try to provide some `mllib`-specific one later, when I have access to a machine where I can build the docs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733565168


   Thanks everyone!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733396507






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730790137






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730048664


   **[Test build #131307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131307/testReport)** for PR 30413 at commit [`c777b79`](https://github.com/apache/spark/commit/c777b79cbe7ddf7fab9714b762604f42bc2cf043).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733348463


   @zero323, the only reason I have chosen `cols` over `*cols` is that I felt odds to just document this such as `*cols : tuple`, and thought `` cols: str, :class:`Column` ... `` is clearer.
   
   I took a look at https://numpydoc.readthedocs.io/en/latest/example.html:
   
   > ```
   >     *args : iterable
   >         Other arguments.
   >     long_var_name : {'hi', 'ho'}, optional
   >         Choices in brackets, default first when optional.
   >     **kwargs : dict
   >         Keyword arguments.
   > ```
   
   If there's other way arounds and it's still standard of numpydoc (or at least numpy itself uses it), I am fine either way.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529762661



##########
File path: python/pyspark/mllib/clustering.py
##########
@@ -499,32 +554,36 @@ class GaussianMixture(object):
 
     .. versionadded:: 1.3.0
     """
+
     @classmethod
-    @since('1.3.0')
     def train(cls, rdd, k, convergenceTol=1e-3, maxIterations=100, seed=None, initialModel=None):
         """
         Train a Gaussian Mixture clustering model.
 
-        :param rdd:
-          Training points as an `RDD` of `Vector` or convertible
-          sequence types.
-        :param k:
-          Number of independent Gaussians in the mixture model.
-        :param convergenceTol:
-          Maximum change in log-likelihood at which convergence is
-          considered to have occurred.
-          (default: 1e-3)
-        :param maxIterations:
-          Maximum number of iterations allowed.
-          (default: 100)
-        :param seed:
-          Random seed for initial Gaussian distribution. Set as None to
-          generate seed based on system time.
-          (default: None)
-        :param initialModel:
-          Initial GMM starting point, bypassing the random
-          initialization.
-          (default: None)
+        .. versionadded:: 1.3.0
+
+        Parameters
+        ----------
+        rdd : ::py:class:`pyspark.RDD`
+            Training points as an `RDD` of :py:class:`pyspark.mllib.linalg.Vector`
+            or convertible sequence types.
+        param k : int

Review comment:
       param is redundant.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529108779



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -88,20 +88,26 @@ class LogisticRegressionModel(LinearClassificationModel):
     Classification model trained using Multinomial/Binary Logistic
     Regression.
 
-    :param weights:
-      Weights computed for every feature.
-    :param intercept:
-      Intercept computed for this model. (Only used in Binary Logistic
-      Regression. In Multinomial Logistic Regression, the intercepts will
-      not bea single value, so the intercepts will be part of the
-      weights.)
-    :param numFeatures:
-      The dimension of the features.
-    :param numClasses:
-      The number of possible outcomes for k classes classification problem
-      in Multinomial Logistic Regression. By default, it is binary
-      logistic regression so numClasses will be set to 2.
+    .. versionadded:: 0.9.0
 
+    Parameters
+    ----------
+    weights : :py:class:`pyspark.mllib.linalg.Vector`
+        Weights computed for every feature.
+    intercept : float
+        Intercept computed for this model. (Only used in Binary Logistic
+        Regression. In Multinomial Logistic Regression, the intercepts will
+        not bea single value, so the intercepts will be part of the

Review comment:
       not bea -> not be a




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529771614



##########
File path: python/pyspark/mllib/stat/_statistics.py
##########
@@ -65,11 +65,19 @@ def colStats(rdd):
         """
         Computes column-wise summary statistics for the input RDD[Vector].
 
-        :param rdd: an RDD[Vector] for which column-wise summary statistics
-                    are to be computed.
-        :return: :class:`MultivariateStatisticalSummary` object containing
-                 column-wise summary statistics.
-
+        Parameters
+        ----------
+        rdd : :py:class:`pyspark.RDD`
+            an RDD[Vector] for which column-wise summary statistics

Review comment:
       `RDD[Vector]` looks Scala syntax? How about RDD of Vector?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-731894260


   LGTM if it results in better docs.
   BTW, @zero323 could you please provide some screenshots after this change (like https://github.com/apache/spark/pull/30149)?
   It may help reviewers to better understand this change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730806545


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35987/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730816235






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730754436


   **[Test build #131384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131384/testReport)** for PR 30413 at commit [`e86af30`](https://github.com/apache/spark/commit/e86af30641891e75a3fe68dd41c502d3c45d23d0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730057262


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35910/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r527329138



##########
File path: python/pyspark/mllib/classification.py
##########
@@ -326,55 +336,65 @@ def train(rdd, i):
 
 class LogisticRegressionWithLBFGS(object):
     """
+    Train a classification model for Multinomial/Binary Logistic Regression

Review comment:
       Here and similar contexts added minimal docs, so summary is actually visible in top level doc. Otherwise we just get version added comment, which is not very useful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730048874






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733414775






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733433237






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730816240


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35987/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730766150


   cc @viirya, @huaxingao, @WeichenXu123, @zhengruifeng , this is the last PR for the initial migration to NumPy documentation style. Would you guys mind taking a quick look when you guys fine some time? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529829185



##########
File path: python/pyspark/mllib/util.py
##########
@@ -273,24 +317,30 @@ def convertVectorColumnsFromML(dataset, *cols):
         return callMLlibFunc("convertVectorColumnsFromML", dataset, list(cols))
 
     @staticmethod
-    @since("2.0.0")
     def convertMatrixColumnsToML(dataset, *cols):
         """
         Converts matrix columns in an input DataFrame from the
         :py:class:`pyspark.mllib.linalg.Matrix` type to the new
         :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
         package.
 
-        :param dataset:
-          input dataset
-        :param cols:
-          a list of matrix columns to be converted.
-          New matrix columns will be ignored. If unspecified, all old
-          matrix columns will be converted excepted nested ones.
-        :return:
-          the input dataset with old matrix columns converted to the
-          new matrix type
+        .. versionadded:: 2.0.0
 
+        dataset : :py:class:`pyspark.sql.DataFrame`
+            input dataset
+        cols : str

Review comment:
       That's a good question. Looking at [`numpy.ndarray.item`](https://numpy.org/devdocs/reference/generated/numpy.ndarray.item.html) we should make varargs explicit
   
       *cols : ...
   
   As of type I was thinking `str` as we support variable number of `str` values, but looking at the code, a single `List[str]` would do as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732981002


   **[Test build #131662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)** for PR 30413 at commit [`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730766150


   cc @viirya, @huaxingao, @WeichenXu123, @zhengruifeng , this is the last PR for the initial migration to NumPy documentation style. Would you guys mind taking a quick look when you guys find some time? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732964747


   **[Test build #131662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)** for PR 30413 at commit [`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730863277


   I might be only able to look at this tomorrow or weekend.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733403353


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733327260


   **[Test build #131706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131706/testReport)** for PR 30413 at commit [`eca86aa`](https://github.com/apache/spark/commit/eca86aa9c490cc359643fb95a93d9fb61999e17f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r530021426



##########
File path: python/pyspark/mllib/util.py
##########
@@ -273,24 +317,30 @@ def convertVectorColumnsFromML(dataset, *cols):
         return callMLlibFunc("convertVectorColumnsFromML", dataset, list(cols))
 
     @staticmethod
-    @since("2.0.0")
     def convertMatrixColumnsToML(dataset, *cols):
         """
         Converts matrix columns in an input DataFrame from the
         :py:class:`pyspark.mllib.linalg.Matrix` type to the new
         :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
         package.
 
-        :param dataset:
-          input dataset
-        :param cols:
-          a list of matrix columns to be converted.
-          New matrix columns will be ignored. If unspecified, all old
-          matrix columns will be converted excepted nested ones.
-        :return:
-          the input dataset with old matrix columns converted to the
-          new matrix type
+        .. versionadded:: 2.0.0
 
+        dataset : :py:class:`pyspark.sql.DataFrame`
+            input dataset
+        cols : str

Review comment:
       >  a single List[str] would do as well.
   
   Sorry, I took another look at the implementation and it turns out I've misread the code ‒ passing a `List[str]` won't work here. I'll fix that in a second.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733414775






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zero323 commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
zero323 commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732099229


   Some examples have been provided by @HyukjinKwon in the initial PR (https://github.com/apache/spark/pull/30149), but I'll try to provide some `mllib`-specific one later, when I have access to a machine where I can build the docs.
   
   > LGTM if it results in better docs.
   > BTW, @zero323 could you please provide some screenshots after this change (like #30149)?
   > It may help reviewers to better understand this change.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730041859


   **[Test build #131307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131307/testReport)** for PR 30413 at commit [`c777b79`](https://github.com/apache/spark/commit/c777b79cbe7ddf7fab9714b762604f42bc2cf043).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730816223


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35987/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733433237






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529769785



##########
File path: python/pyspark/mllib/regression.py
##########
@@ -224,11 +234,13 @@ def _regression_train_wrapper(train_func, modelClass, data, initial_weights):
 
 class LinearRegressionWithSGD(object):
     """
+    Train a linear regression model with no regularization using Stochastic Gradient Descent.
+
     .. versionadded:: 0.9.0
-    .. note:: Deprecated in 2.0.0. Use ml.regression.LinearRegression.
+    .. deprecated:: 2.0.0.

Review comment:
       2.0.0. -> 2.0.0




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-733238511






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732991037






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730048874






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730790137






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730065212






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-732964747


   **[Test build #131662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131662/testReport)** for PR 30413 at commit [`ba6bd70`](https://github.com/apache/spark/commit/ba6bd707867f18ba1708dc30e4ce7dc2f1425055).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30413:
URL: https://github.com/apache/spark/pull/30413#issuecomment-730065197


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35910/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30413: [SPARK-33252][PYTHON][DOCS] Migration to NumPy documentation style in MLlib (pyspark.mllib.*)

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30413:
URL: https://github.com/apache/spark/pull/30413#discussion_r529776310



##########
File path: python/pyspark/mllib/util.py
##########
@@ -313,24 +363,30 @@ def convertMatrixColumnsToML(dataset, *cols):
         return callMLlibFunc("convertMatrixColumnsToML", dataset, list(cols))
 
     @staticmethod
-    @since("2.0.0")
     def convertMatrixColumnsFromML(dataset, *cols):
         """
         Converts matrix columns in an input DataFrame to the
         :py:class:`pyspark.mllib.linalg.Matrix` type from the new
         :py:class:`pyspark.ml.linalg.Matrix` type under the `spark.ml`
         package.
 
-        :param dataset:
-          input dataset
-        :param cols:
-          a list of matrix columns to be converted.
-          Old matrix columns will be ignored. If unspecified, all new
-          matrix columns will be converted except nested ones.
-        :return:
-          the input dataset with new matrix columns converted to the
-          old matrix type
+        .. versionadded:: 2.0.0
+
+        dataset : :py:class:`pyspark.sql.DataFrame`
+            input dataset
+        cols : str

Review comment:
       ditto.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org