You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2015/12/24 10:20:38 UTC

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/10469

    [SPARK-11939] [ML] [PySpark] PySpark support model export/import and take LinearRegression as example

    * PySpark support model export/import.
    * Take ```LinearRegression``` as example. After this merged, we can split the whole task into subtask and distribute to the community.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-11939

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10469.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10469
    
----
commit 19b07b5aaeeea76d757731ab3e0f998fe5899809
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-12-24T09:14:29Z

    PySpark support model export/import and take LinearRegression as example

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174476558
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50789061
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    --- End diff --
    
    This is the problematic workflow I had in mind:
    ```
    lr = LinearRegression() // JVM object instantiated
    lr.setMaxIter(10) // Python value set, but not transferred to JVM object
    lr.save(...)  // new maxIter value is not saved
    ```
    
    I think it will be easy to fix; just call ```_transfer_params_to_java``` in the init method of MLWriter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174956146
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50948067
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    --- End diff --
    
    Yes, I removed the ```Since``` annotations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903253
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -159,15 +151,16 @@ class JavaModel(Model, JavaTransformer):
     
         __metaclass__ = ABCMeta
     
    -    def __init__(self, java_model):
    +    def __init__(self, java_model=None):
    --- End diff --
    
    Can you add a note in the doc to explain this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408898
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    --- End diff --
    
    Update doc (not abstract in Python)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176042084
  
    **[Test build #50264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50264/consoleFull)** for PR 10469 at commit [`f07ffcb`](https://github.com/apache/spark/commit/f07ffcb5f47928870b6c9ae14904efcba8c539de).
     * This patch **fails Python style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:\n  * `class LinearRegressionModel(JavaModel, MLWritable, MLReadable):`\n  * `class JavaMLWriter(object):`\n  * `class MLWritable(object):`\n  * `class JavaMLReader(object):`\n  * `        java_class = cls._java_loader_class(clazz)`\n  * `class MLReadable(object):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-167078986
  
    **[Test build #48300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48300/consoleFull)** for PR 10469 at commit [`19b07b5`](https://github.com/apache/spark/commit/19b07b5aaeeea76d757731ab3e0f998fe5899809).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `class LinearRegressionModel(JavaModel, MLWritable, TransformerMLReadable):`\n  * `class MLWriter(object):`\n  * `class MLWritable(object):`\n  * `class MLReader(object):`\n  * `class MLReadable(object):`\n  * `        java_class = cls._java_loader_class()`\n  * `class TransformerMLReadable(MLReadable):`\n  * `class EstimatorMLReadable(MLReadable):`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903243
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        instance._transfer_params_to_java()
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self.write().save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        java_obj = self._jread.load(path)
    +        self._instance._java_obj = java_obj
    +        self._instance.uid = java_obj.uid()
    +        self._instance._transfer_params_from_java(True)
    +        return self._instance
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for loading."""
    +        self._jread.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLReadable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for objects that provide MLReader using its Scala implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @classmethod
    +    def _java_loader_class(cls):
    +        """
    +        Returns the full class name of the Java loader. The default
    +        implementation replaces "pyspark" by "org.apache.spark" in
    +        the Python full class name.
    +        """
    +        java_package = cls.__module__.replace("pyspark", "org.apache.spark")
    +        return ".".join([java_package, cls.__name__])
    +
    +    @classmethod
    +    def _load_java_obj(cls):
    +        """Load the peer Java object."""
    +        java_class = cls._java_loader_class()
    +        java_obj = _jvm()
    +        for name in java_class.split("."):
    +            java_obj = getattr(java_obj, name)
    +        return java_obj
    +
    +    @classmethod
    +    @since("2.0.0")
    +    def read(cls):
    +        """Returns an MLReader instance for this class."""
    +        instance = cls()
    --- End diff --
    
    This could be done within the MLReader init method (if we want to make MLReadable more general and not JVM-specific).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-167077459
  
    **[Test build #48300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48300/consoleFull)** for PR 10469 at commit [`19b07b5`](https://github.com/apache/spark/commit/19b07b5aaeeea76d757731ab3e0f998fe5899809).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50848748
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -68,6 +68,30 @@ class LinearRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPrediction
         Traceback (most recent call last):
             ...
         TypeError: Method setParams forces keyword arguments.
    +    >>> import os, tempfile
    +    >>> path = tempfile.mkdtemp()
    +    >>> lr_path = path + "/lr"
    +    >>> lr.save(lr_path)
    +    >>> lr2 = LinearRegression.load(lr_path)
    +    >>> lr2.getOrDefault(lr2.getParam("maxIter"))
    +    5
    --- End diff --
    
    Test the param has been successfully ```save``` and ```load```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408931
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self._java_obj.save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    Abstract class for utility classes that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        self._instance.load(path)
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for loading."""
    +        self._jread.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLReadable(object):
    +    """
    +    Mixin for objects that provide MLReader using its Scala implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @classmethod
    +    def _java_loader_class(cls):
    +        """
    +        Returns the full class name of the Java loader. The default
    +        implementation replaces "pyspark" by "org.apache.spark" in
    +        the Python full class name.
    +        """
    +        java_package = cls.__module__.replace("pyspark", "org.apache.spark")
    +        return ".".join([java_package, cls.__name__])
    +
    +    @classmethod
    +    def _load_java(cls, path):
    +        """
    +        Load a Java model from the given path.
    +        """
    +        java_class = cls._java_loader_class()
    +        java_obj = _jvm()
    +        for name in java_class.split("."):
    +            java_obj = getattr(java_obj, name)
    +        return java_obj.load(path)
    +
    +    @classmethod
    +    @since("2.0.0")
    +    def read(self):
    +        """Returns an MLReader instance for this class."""
    +        return MLReader(self)
    +
    +
    +@inherit_doc
    +class TransformerMLReadable(MLReadable):
    --- End diff --
    
    I'm hoping this and EstimatorMLReadable will not be needed if we can make a generic MLReadable which works for any JavaWrapper type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50902970
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -68,6 +68,30 @@ class LinearRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPrediction
         Traceback (most recent call last):
             ...
         TypeError: Method setParams forces keyword arguments.
    +    >>> import os, tempfile
    +    >>> path = tempfile.mkdtemp()
    +    >>> lr_path = path + "/lr"
    +    >>> lr.save(lr_path)
    +    >>> lr2 = LinearRegression.load(lr_path)
    +    >>> lr2.getOrDefault(lr2.getParam("maxIter"))
    +    5
    +    >>> model2 = lr2.fit(df)
    --- End diff --
    
    Actually, I don't think you need to fit a second model.  If the goal is to test that all params are the same, that should be done explicitly in a unit test (but it's OK with me not to).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408899
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    --- End diff --
    
    Note that this and other classes for read/write are Experimental


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175472122
  
    @jkbradley Thanks for your comments! I have made ```MLReadable``` and ```MLWritable``` more general and not specific to Java wrappers, addressed all comments except for setting the ```Param.parent```. I left the inline comments in the threads above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50680243
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self._java_obj.save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    Abstract class for utility classes that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        self._instance.load(path)
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for loading."""
    +        self._jread.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLReadable(object):
    +    """
    +    Mixin for objects that provide MLReader using its Scala implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @classmethod
    +    def _java_loader_class(cls):
    --- End diff --
    
    ```MLReadable``` is a mixin, so can not a JavaWrapper abstraction. For example, to make ```LinearRegression``` support save/load, it should extend from both ```JavaEstimator``` and ```MLReadable/MLWritable```. ```JavaEstimator``` has extend from ```JavaWrapper```, and we can not make ```MLReadable/MLWritable``` also extend from ```JavaWrapper```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176042102
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174955912
  
    **[Test build #50099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50099/consoleFull)** for PR 10469 at commit [`3a0d6be`](https://github.com/apache/spark/commit/3a0d6bee0d51cf071a0a7e5e6aaedc0fe5a8d3a0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176860037
  
    LGTM
    Thanks for this PR!  This makes it really simple to add persistence.
    I'll merge it with master.
    Would you mind creating JIRAs for the follow-up tasks for each Python component?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174956149
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50099/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903137
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    --- End diff --
    
    How about we rename this to MLJavaWriter or JavaMLWriter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174476559
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49987/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408928
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self._java_obj.save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    Abstract class for utility classes that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        self._instance.load(path)
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for loading."""
    +        self._jread.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLReadable(object):
    +    """
    +    Mixin for objects that provide MLReader using its Scala implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @classmethod
    +    def _java_loader_class(cls):
    --- End diff --
    
    Rather than having this here, should we make this abstraction extend JavaWrapper?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-170737086
  
    @yanboliang I'll take a look at this now.  Sorry for the delay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174952327
  
    **[Test build #50099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50099/consoleFull)** for PR 10469 at commit [`3a0d6be`](https://github.com/apache/spark/commit/3a0d6bee0d51cf071a0a7e5e6aaedc0fe5a8d3a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50681326
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    --- End diff --
    
    Checked, and it will not exist in classes which are added in later Spark version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903149
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    --- End diff --
    
    Does that mean the annotation is not inherited at all?  If that's the case, then maybe we should remove the Since annotations (since these mix-ins are private anyways).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903192
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        instance._transfer_params_to_java()
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self.write().save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    --- End diff --
    
    How about we rename this to MLJavaReader or JavaMLReader?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408892
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -68,6 +69,28 @@ class LinearRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPrediction
         Traceback (most recent call last):
             ...
         TypeError: Method setParams forces keyword arguments.
    +    >>> import os, tempfile
    +    >>> path = tempfile.mkdtemp()
    +    >>> lr_path = path + "/lr"
    +    >>> lr.save(lr_path)
    +    >>> lr2 = LinearRegression.load(lr_path)
    +    >>> model2 = lr2.fit(df)
    +    >>> abs(model.coefficients[0] - model2.coefficients[0]) < 0.001
    +    True
    +    >>> abs(model.intercept - model2.intercept) < 0.001
    +    True
    +    >>> model_path = path + "/model"
    +    >>> model.save(model_path)
    +    >>> model3 = LinearRegressionModel.load(model_path)
    +    >>> abs(model.coefficients[0] - model3.coefficients[0]) < 0.001
    --- End diff --
    
    Test for exact equality here & in the next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903200
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        instance._transfer_params_to_java()
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self.write().save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        java_obj = self._jread.load(path)
    +        self._instance._java_obj = java_obj
    +        self._instance.uid = java_obj.uid()
    +        self._instance._transfer_params_from_java(True)
    +        return self._instance
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for loading."""
    +        self._jread.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLReadable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for objects that provide MLReader using its Scala implementation.
    --- End diff --
    
    Could this be kept more general by putting JVM-specific things in MLReader?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175804796
  
    @Wenpei This isn't the right forum for posting comments like that (since hardly anyone will see your comment).  I'd recommend identifying missing unit tests and making JIRAs for them.  We do have tests.py files under pyspark/ml and pyspark/mllib, so please do check those first before making JIRAs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175474685
  
    **[Test build #50182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50182/consoleFull)** for PR 10469 at commit [`7329124`](https://github.com/apache/spark/commit/732912411b246809eb56569ac378b936f496d4cc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class JavaMLWriter(object):`
      * `class JavaMLReader(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50678918
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    --- End diff --
    
    Here we call the peer java object to do save operation, and the Params has been copied at the construction of the peer java object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-170761773
  
    I just added some comments quickly, but let me know if my suggestions are workable.  I did not test the suggestions myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176066797
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175241081
  
    Thanks for the updates!  Done with a pass.  They are mostly minor comments, except for making MLReadable, MLWritable more general and not specific to Java wrappers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50902964
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -68,6 +68,30 @@ class LinearRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPrediction
         Traceback (most recent call last):
             ...
         TypeError: Method setParams forces keyword arguments.
    +    >>> import os, tempfile
    +    >>> path = tempfile.mkdtemp()
    +    >>> lr_path = path + "/lr"
    +    >>> lr.save(lr_path)
    +    >>> lr2 = LinearRegression.load(lr_path)
    +    >>> lr2.getOrDefault(lr2.getParam("maxIter"))
    --- End diff --
    
    Simpler to call ```lr2.getMaxIter()```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174783281
  
    @yanboliang Thanks for the updates; I'll try to make final comments soon.  I left one response in one of the threads above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175998765
  
    @yanboliang I hope you don't mind, but I took the liberty of experimenting a bit myself and sending this PR: [https://github.com/yanboliang/spark/pull/4]  Please let me know what you think!
    
    Btw, thanks for the generalization-related updates.  I guess we'd have to go further (providing MLWriter, MLReader abstract classes) if we wanted to allow Python developers to implement persistence from Python, but we can address that in the future (if anyone requests it).  Your changes should help us work towards that though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176066701
  
    **[Test build #50267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50267/consoleFull)** for PR 10469 at commit [`7334be9`](https://github.com/apache/spark/commit/7334be978468cee2bb4b9e5b13e296392080246b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176066798
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50267/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175474917
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10469


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by Wenpei <gi...@git.apache.org>.

Github user Wenpei commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175631491
  
    HI 
    I raise a common issues here when I start look pyspark. I found there is only one test.py to test basic RDD and spark submit related api. There is no ml & mllib related unit test currently, I knew we just implement wrapper here, but some place may need ut, for example, parameter process. 
    Just raise what I thought. 
    
    Regards.
    Wenpei



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-167079056
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48300/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175471302
  
    **[Test build #50182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50182/consoleFull)** for PR 10469 at commit [`7329124`](https://github.com/apache/spark/commit/732912411b246809eb56569ac378b936f496d4cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903126
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        instance._transfer_params_to_java()
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for ML instances that provide MLWriter through their Scala
    --- End diff --
    
    Actually, this is a general mix-in; it does not rely on a Scala implementation.  It is MLWriter which uses the Scala implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903158
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        instance._transfer_params_to_java()
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    --- End diff --
    
    Add this check to MLWriter.save too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408891
  
    --- Diff: python/pyspark/ml/regression.py ---
    @@ -68,6 +69,28 @@ class LinearRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPrediction
         Traceback (most recent call last):
             ...
         TypeError: Method setParams forces keyword arguments.
    +    >>> import os, tempfile
    +    >>> path = tempfile.mkdtemp()
    +    >>> lr_path = path + "/lr"
    +    >>> lr.save(lr_path)
    +    >>> lr2 = LinearRegression.load(lr_path)
    +    >>> model2 = lr2.fit(df)
    +    >>> abs(model.coefficients[0] - model2.coefficients[0]) < 0.001
    +    True
    +    >>> abs(model.intercept - model2.intercept) < 0.001
    +    True
    +    >>> model_path = path + "/model"
    --- End diff --
    
    Use directory "/lr_model"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50848128
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -82,13 +71,16 @@ def _transfer_params_to_java(self):
                     pair = self._make_java_param_pair(param, paramMap[param])
                     self._java_obj.set(pair)
     
    -    def _transfer_params_from_java(self):
    +    def _transfer_params_from_java(self, withParent=False):
             """
             Transforms the embedded params from the companion Java object.
             """
             sc = SparkContext._active_spark_context
    +        parent = self._java_obj.uid()
             for param in self.params:
                 if self._java_obj.hasParam(param.name):
    +                if withParent:
    +                    param.parent = parent
    --- End diff --
    
    We should also copy ```parent``` from peer java object if the python object is instantiated by ```load```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176042108
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50264/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176057088
  
    **[Test build #50267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50267/consoleFull)** for PR 10469 at commit [`7334be9`](https://github.com/apache/spark/commit/7334be978468cee2bb4b9e5b13e296392080246b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176041281
  
    **[Test build #50264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50264/consoleFull)** for PR 10469 at commit [`f07ffcb`](https://github.com/apache/spark/commit/f07ffcb5f47928870b6c9ae14904efcba8c539de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-175474923
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50182/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13032] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-176070210
  
    @jkbradley You PR looks good and get merged, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50789186
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self._java_obj.save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    Abstract class for utility classes that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        self._instance.load(path)
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for loading."""
    +        self._jread.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLReadable(object):
    +    """
    +    Mixin for objects that provide MLReader using its Scala implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @classmethod
    +    def _java_loader_class(cls):
    --- End diff --
    
    Right, good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174470901
  
    **[Test build #49987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49987/consoleFull)** for PR 10469 at commit [`106a2b8`](https://github.com/apache/spark/commit/106a2b8c35c463fcf9f472dbd7ebd4718b02e983).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50903247
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -82,13 +71,16 @@ def _transfer_params_to_java(self):
                     pair = self._make_java_param_pair(param, paramMap[param])
                     self._java_obj.set(pair)
     
    -    def _transfer_params_from_java(self):
    +    def _transfer_params_from_java(self, withParent=False):
    --- End diff --
    
    I think it'd be better to set the Param.parent field for each Param in load().  I don't think we'd reuse this functionality elsewhere, so it would be good to isolate to load().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50950853
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -82,13 +71,16 @@ def _transfer_params_to_java(self):
                     pair = self._make_java_param_pair(param, paramMap[param])
                     self._java_obj.set(pair)
     
    -    def _transfer_params_from_java(self):
    +    def _transfer_params_from_java(self, withParent=False):
    --- End diff --
    
    I try to put setting Param.parent field into load(), and we will add such code snippet to ```JavaMLReader.load()```:
    ```Scala
    for param in self._instance.params:
        value = self._instance._paramMap[param]
        self._instance._paramMap.pop(param)
        param.parent = self._instance.uid
        self._instance._paramMap[param] = value
    ```
    It will first pop elements from the dict and then push new one which I think is inefficient.
    And this code will operate on ```_paramMap``` which is the ML instance's internal variable. 
    So I vote to do this work at ```_transfer_params_from_java``` or we can add a new function there to do the same work. I'm open to hear your thoughts. @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408919
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    --- End diff --
    
    Will this annotation get inherited to classes which are added in later Spark versions?  (Can you please try generating the docs to see if it appears for LinearRegression?) If so, let's omit it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408926
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self._java_obj.save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    Abstract class for utility classes that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        self._instance.load(path)
    --- End diff --
    
    The load implementation should go here.  To be analogous to the Scala implementation, we should have Model/Estimator.load methods call ```[instance].read().load(...)```.  That will allow us to add options to the reader later on.
    
    It will simplify things if we make MLReader only applicable for JavaWrapper subclasses.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50847681
  
    --- Diff: python/pyspark/ml/wrapper.py ---
    @@ -159,15 +151,16 @@ class JavaModel(Model, JavaTransformer):
     
         __metaclass__ = ABCMeta
     
    -    def __init__(self, java_model):
    +    def __init__(self, java_model=None):
    --- End diff --
    
    Unify the construction of ```Model``` and ```Estimator```, that is ```Model``` can be instantiated without argument which is used when ```model.load```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-167079053
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r50848454
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,133 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        instance._transfer_params_to_java()
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Saves the ML instances to the input path."""
    +        self._jwrite.save(path)
    +
    +    @since("2.0.0")
    +    def overwrite(self):
    +        """Overwrites if the output path already exists."""
    +        self._jwrite.overwrite()
    +        return self
    +
    +    @since("2.0.0")
    +    def context(self, sqlContext):
    +        """Sets the SQL context to use for saving."""
    +        self._jwrite.context(sqlContext._ssql_ctx)
    +        return self
    +
    +
    +@inherit_doc
    +class MLWritable(object):
    +    """
    +    .. note:: Experimental
    +
    +    Mixin for ML instances that provide MLWriter through their Scala
    +    implementation.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @since("2.0.0")
    +    def write(self):
    +        """Returns an MLWriter instance for this ML instance."""
    +        return MLWriter(self)
    +
    +    @since("2.0.0")
    +    def save(self, path):
    +        """Save this ML instance to the given path, a shortcut of `write().save(path)`."""
    +        if not isinstance(path, basestring):
    +            raise TypeError("path should be a basestring, got type %s" % type(path))
    +        self.write().save(path)
    +
    +
    +@inherit_doc
    +class MLReader(object):
    +    """
    +    .. note:: Experimental
    +
    +    Utility class that can load ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._instance = instance
    +        self._jread = instance._java_obj.read()
    +
    +    @since("2.0.0")
    +    def load(self, path):
    +        """Loads the ML component from the input path."""
    +        java_obj = self._jread.load(path)
    +        self._instance._java_obj = java_obj
    +        self._instance.uid = java_obj.uid()
    +        self._instance._transfer_params_from_java(True)
    --- End diff --
    
    Transfer all params from peer java object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10469#discussion_r49408907
  
    --- Diff: python/pyspark/ml/util.py ---
    @@ -52,3 +71,141 @@ def _randomUID(cls):
             concatenates the class name, "_", and 12 random hex chars.
             """
             return cls.__name__ + "_" + uuid.uuid4().hex[12:]
    +
    +
    +@inherit_doc
    +class MLWriter(object):
    +    """
    +    Abstract class for utility classes that can save ML instances.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    def __init__(self, instance):
    +        self._jwrite = instance._java_obj.write()
    +
    +    @since("2.0.0")
    +    def save(self, path):
    --- End diff --
    
    This method will need to copy Params over to the JVM model.  Please also update the doc test to test for some parameter value to make sure Params get copied.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-11939] [ML] [PySpark] PySpark support m...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10469#issuecomment-174476418
  
    **[Test build #49987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49987/consoleFull)** for PR 10469 at commit [`106a2b8`](https://github.com/apache/spark/commit/106a2b8c35c463fcf9f472dbd7ebd4718b02e983).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org