You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by sethah <gi...@git.apache.org> on 2016/03/12 00:15:31 UTC

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

GitHub user sethah opened a pull request:

    https://github.com/apache/spark/pull/11663

    [SPARK-13068][PYSPARK][ML] Type conversion for Pyspark params

    ## What changes were proposed in this pull request?
    
    This patch adds type conversion functionality for parameters in Pyspark. A `typeConverter` field is added to the constructor of `Param` class. This argument is a function which converts values passed to this param to the appropriate type if possible. This is beneficial so that the params can fail at set time if they are given inappropriate values, but even more so because coherent error messages are now provided when Py4J cannot cast the python type to the appropriate Java type. 
    
    This patch also adds a `TypeConverters` class with factory methods for common type conversions. Most of the changes involve adding these factory type converters to existing params. The previous solution to this issue, `expectedType`, is deprecated and can be removed in 2.1.0 as discussed on the Jira.
    
    
    ## How was this patch tested?
    
    Unit tests were added in python/pyspark/ml/tests.py to test parameter type conversion. These tests check that values that should be convertible are converted correctly, and that the appropriate errors are thrown when invalid values are provided.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sethah/spark SPARK-13068-tc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11663.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11663
    
----
commit bca206d0f9c3aeef6ce768e542f0eb4cbf9274ee
Author: sethah <se...@gmail.com>
Date:   2016-03-10T23:25:39Z

    type conversion for params

commit 798e4f89549e3fc74719359e769164d20dedefb1
Author: sethah <se...@gmail.com>
Date:   2016-03-11T00:34:38Z

    removing _convert method and using typeConverter directly in _set

commit 6e2399f1c69563a864ec1da4b111c3c6ce65200d
Author: sethah <se...@gmail.com>
Date:   2016-03-11T16:50:39Z

    docstring and typo

commit 257f46591d9198729bf2fb3e28186b9c96097e2d
Author: sethah <se...@gmail.com>
Date:   2016-03-11T22:31:53Z

    refactoring type conversions

commit 40b00dd2e3209fcf757ec3e3759ce4270096087e
Author: sethah <se...@gmail.com>
Date:   2016-03-11T23:03:19Z

    fixing deprecation message

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200027312
  
    **[Test build #53816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53816/consoleFull)** for PR 11663 at commit [`ff7f94d`](https://github.com/apache/spark/commit/ff7f94d249fb54c0b7dc7b48dca5e268ace090ce).
     * This patch **fails R style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-198012696
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200041505
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53821/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024786
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -441,7 +445,8 @@ class ElementwiseProduct(JavaTransformer, HasInputCol, HasOutputCol, MLReadable,
         """
     
         scalingVec = Param(Params._dummy(), "scalingVec", "vector for hadamard product, " +
    -                       "it must be MLlib Vector type.")
    +                       "it must be MLlib Vector type.",
    --- End diff --
    
    We can remove this restriction in the doc now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406647
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    --- End diff --
    
    This too might not work with SparseVectors.  It would also be nice to check that the value is an integer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57083934
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    +        vtype = type(value)
    +        return vtype in [str, np.unicode_, np.string_, np.str_] or type(value).__name__ == 'unicode'
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return list(map(lambda v: float(v), value))
    --- End diff --
    
    Oh, did not realize.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-198012705
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53446/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200041499
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024828
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    --- End diff --
    
    Could also support ```tuple, array.array, xrange``` here + in ```toList```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56547627
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    --- End diff --
    
    Done, added an `_is_integer()` check to make sure values are whole numbers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200041490
  
    **[Test build #53821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53821/consoleFull)** for PR 11663 at commit [`e1e4643`](https://github.com/apache/spark/commit/e1e46433582b0740be3ed2977e2da86e8414f6af).
     * This patch **fails R style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196383071
  
    **[Test build #53076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53076/consoleFull)** for PR 11663 at commit [`34e9925`](https://github.com/apache/spark/commit/34e99258fc405817e12cab73b29bbbb900ba8850).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200072224
  
    **[Test build #53835 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53835/consoleFull)** for PR 11663 at commit [`6a8d5f6`](https://github.com/apache/spark/commit/6a8d5f65c708f1d6418e16b0ef59945b2eabe8ba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200072336
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024739
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: int(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def convertToVector(value):
    --- End diff --
    
    Oh good point.  Let's stick with what you did.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406633
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    --- End diff --
    
    ```list(value.toArray())```  --> Add unit test which catches this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200052416
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57084301
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +80,144 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        return TypeConverters._is_numeric(value) and float(value).is_integer()
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype in [list, np.ndarray, tuple, xrange, array.array] or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _can_convert_to_string(value):
    --- End diff --
    
    I hadn't thought about this before, but we actually should support unicode.  The main use case is StringIndexer, which might be used to index unicode.  For that, we'd want to pass an array of unicode and probably avoid converting it to str types.
    
    Java/Scala should already handle this since java.lang.String handles unicode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406664
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -275,23 +382,9 @@ def _set(self, **kwargs):
             """
             for param, value in kwargs.items():
                 p = getattr(self, param)
    -            if p.expectedType is None or type(value) == p.expectedType or value is None:
    -                self._paramMap[getattr(self, param)] = value
    -            else:
    -                try:
    -                    # Try and do "safe" conversions that don't lose information
    -                    if p.expectedType == float:
    -                        self._paramMap[getattr(self, param)] = float(value)
    -                    # Python 3 unified long & int
    -                    elif p.expectedType == int and type(value).__name__ == 'long':
    -                        self._paramMap[getattr(self, param)] = value
    -                    else:
    -                        raise Exception(
    -                            "Provided type {0} incompatible with type {1} for param {2}"
    -                            .format(type(value), p.expectedType, p))
    -                except ValueError:
    -                    raise Exception(("Failed to convert {0} to type {1} for param {2}"
    -                                     .format(type(value), p.expectedType, p)))
    +            if value is not None:
    +                value = p.typeConverter(value)
    +            self._paramMap[getattr(self, param)] = value
    --- End diff --
    
    reuse value ```p```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200476479
  
    LGTM
    Thanks for a great PR!
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200428492
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53943/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406616
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    --- End diff --
    
    rewrite as:
    ```
    type(value) in [int, float, ...]
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-195603328
  
    **[Test build #52960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52960/consoleFull)** for PR 11663 at commit [`40b00dd`](https://github.com/apache/spark/commit/40b00dd2e3209fcf757ec3e3759ce4270096087e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-197527954
  
    Made an initial pass.  I like this update--thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200066688
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53832/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200428484
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-198009920
  
    **[Test build #53446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53446/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200072337
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53835/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57077422
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +80,144 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        return TypeConverters._is_numeric(value) and float(value).is_integer()
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype in [list, np.ndarray, tuple, xrange, array.array] or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _can_convert_to_string(value):
    --- End diff --
    
    I changed this to do "safe" unicode to str conversions. The way it was previously, a user could provide non-ascii characters in a string param and get a somewhat mysterious `UnicodeEncodeError`. This way, they should at least get an error message consistent with other TypeConverters. I appreciate feedback on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-197997430
  
    **[Test build #53438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53438/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    probabilityCol = Param(Params._dummy(), \"probabilityCol\", \"Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.\", typeConverter=TypeConverters.toString)`
      * `    thresholds = Param(Params._dummy(), \"thresholds\", \"Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.\", typeConverter=TypeConverters.toListFloat)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024858
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    +        vtype = type(value)
    +        return vtype in [str, np.unicode_, np.string_, np.str_] or type(value).__name__ == 'unicode'
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return list(map(lambda v: float(v), value))
    +        raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def toListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_integer(v), value)):
    +                return list(map(lambda v: int(v), value))
    +        raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def toListString(value):
    +        """
    +        Convert a value to list of strings, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_string(v), value)):
    +                return list(map(lambda v: str(v), value))
    +        raise TypeError("Could not convert %s to list of strings" % value)
    +
    +    @staticmethod
    +    def toVector(value):
    +        """
    +        Convert a value to a MLlib Vector, if possible.
    +        """
    +        if isinstance(value, Vector):
    +            return value
    +        elif TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return DenseVector(value)
    +        raise TypeError("Could not convert %s to vector" % value)
    +
    +    @staticmethod
    +    def toFloat(value):
    +        """
    +        Convert a value to a float, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    +            return float(value)
    +        else:
    +            raise TypeError("Could not convert %s to float" % value)
    +
    +    @staticmethod
    +    def toInt(value):
    +        """
    +        Convert a value to an int, if possible.
    +        """
    +        if TypeConverters._is_integer(value):
    +            return int(value)
    +        else:
    +            raise TypeError("Could not convert %s to int" % value)
    +
    +    @staticmethod
    +    def toString(value):
    +        """
    +        Convert a value to a string, if possible.
    +        """
    +        if TypeConverters._is_string(value):
    +            return str(value)
    +        else:
    +            raise TypeError("Could not convert %s to string" % value)
    +
    +    @staticmethod
    +    def toBoolean(value):
    +        """
    +        Convert a value to a boolean, if possible.
    +        """
    +        if type(value) == bool:
    +            return value
    +        else:
    +            raise TypeError("Could not convert %s to bool" % value)
    --- End diff --
    
    Since this is a bit strict, maybe we should say in the error: "Boolean Param requires value of type bool.  Found type %s"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024850
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    +        vtype = type(value)
    +        return vtype in [str, np.unicode_, np.string_, np.str_] or type(value).__name__ == 'unicode'
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return list(map(lambda v: float(v), value))
    +        raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def toListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_integer(v), value)):
    +                return list(map(lambda v: int(v), value))
    --- End diff --
    
    This should already be a list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56548612
  
    --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py ---
    @@ -105,64 +104,71 @@ def get$Name(self):
     if __name__ == "__main__":
         print(header)
         print("\n# DO NOT MODIFY THIS FILE! It was generated by _shared_params_code_gen.py.\n")
    -    print("from pyspark.ml.param import Param, Params\n\n")
    +    print("from pyspark.ml.param import *\n\n")
         shared = [
    -        ("maxIter", "max number of iterations (>= 0).", None, int),
    -        ("regParam", "regularization parameter (>= 0).", None, float),
    -        ("featuresCol", "features column name.", "'features'", str),
    -        ("labelCol", "label column name.", "'label'", str),
    -        ("predictionCol", "prediction column name.", "'prediction'", str),
    +        ("maxIter", "max number of iterations (>= 0).", None, "TypeConverters.convertToInt"),
    +        ("regParam", "regularization parameter (>= 0).", None, "TypeConverters.convertToFloat"),
    +        ("featuresCol", "features column name.", "'features'", None),
    --- End diff --
    
    I added a `toString` and `toListString` converter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-198008868
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406644
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    --- End diff --
    
    I don't think this will work for SparseVector.  Can you please add a unit test for that?  I'd recommend calling ```value = TypeConverters.convertToList(value)``` first and then converting values to float.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200027324
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196399775
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53077/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406609
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -32,13 +35,17 @@ class Param(object):
         .. versionadded:: 1.3.0
         """
     
    -    def __init__(self, parent, name, doc, expectedType=None):
    +    def __init__(self, parent, name, doc, expectedType=None, typeConverter=None):
             if not isinstance(parent, Identifiable):
                 raise TypeError("Parent must be an Identifiable but got type %s." % type(parent))
             self.parent = parent.uid
             self.name = str(name)
             self.doc = str(doc)
             self.expectedType = expectedType
    +        if expectedType is not None:
    +            warnings.warn("expectedType is deprecated and will be removed in 2.1.0, " +
    +                          "use typeConverter instead.")
    --- End diff --
    
    "use typeConverter instead, as a keyword argument"
    
    Also, I'd put this same message in the docstring too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200068226
  
    **[Test build #53835 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53835/consoleFull)** for PR 11663 at commit [`6a8d5f6`](https://github.com/apache/spark/commit/6a8d5f65c708f1d6418e16b0ef59945b2eabe8ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-199424805
  
    **[Test build #2654 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2654/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024848
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    +        vtype = type(value)
    +        return vtype in [str, np.unicode_, np.string_, np.str_] or type(value).__name__ == 'unicode'
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return list(map(lambda v: float(v), value))
    --- End diff --
    
    This is already a list


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196374260
  
    **[Test build #53076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53076/consoleFull)** for PR 11663 at commit [`34e9925`](https://github.com/apache/spark/commit/34e99258fc405817e12cab73b29bbbb900ba8850).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-199431275
  
    **[Test build #2654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2654/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    probabilityCol = Param(Params._dummy(), \"probabilityCol\", \"Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.\", typeConverter=TypeConverters.toString)`
      * `    thresholds = Param(Params._dummy(), \"thresholds\", \"Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.\", typeConverter=TypeConverters.toListFloat)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200066395
  
    **[Test build #53832 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53832/consoleFull)** for PR 11663 at commit [`2c46076`](https://github.com/apache/spark/commit/2c4607688a458bfecbf40b604e8cd59a5fa4851b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024854
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    +        vtype = type(value)
    +        return vtype in [str, np.unicode_, np.string_, np.str_] or type(value).__name__ == 'unicode'
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return list(map(lambda v: float(v), value))
    +        raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def toListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_integer(v), value)):
    +                return list(map(lambda v: int(v), value))
    +        raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def toListString(value):
    +        """
    +        Convert a value to list of strings, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_string(v), value)):
    +                return list(map(lambda v: str(v), value))
    --- End diff --
    
    already a list


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406653
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: int(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def convertToVector(value):
    --- End diff --
    
    I would just use the existing ```_convert_to_vector``` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196383179
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53076/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200069161
  
    I also commented on 1 earlier item (about the conversions to str)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-195603471
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52960/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56548734
  
    --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py ---
    @@ -105,64 +104,71 @@ def get$Name(self):
     if __name__ == "__main__":
         print(header)
         print("\n# DO NOT MODIFY THIS FILE! It was generated by _shared_params_code_gen.py.\n")
    -    print("from pyspark.ml.param import Param, Params\n\n")
    +    print("from pyspark.ml.param import *\n\n")
         shared = [
    -        ("maxIter", "max number of iterations (>= 0).", None, int),
    -        ("regParam", "regularization parameter (>= 0).", None, float),
    -        ("featuresCol", "features column name.", "'features'", str),
    -        ("labelCol", "label column name.", "'label'", str),
    -        ("predictionCol", "prediction column name.", "'prediction'", str),
    +        ("maxIter", "max number of iterations (>= 0).", None, "TypeConverters.convertToInt"),
    +        ("regParam", "regularization parameter (>= 0).", None, "TypeConverters.convertToFloat"),
    +        ("featuresCol", "features column name.", "'features'", None),
    +        ("labelCol", "label column name.", "'label'", None),
    +        ("predictionCol", "prediction column name.", "'prediction'", None),
             ("probabilityCol", "Column name for predicted class conditional probabilities. " +
              "Note: Not all models output well-calibrated probability estimates! These probabilities " +
    -         "should be treated as confidences, not precise probabilities.", "'probability'", str),
    +         "should be treated as confidences, not precise probabilities.", "'probability'", None),
             ("rawPredictionCol", "raw prediction (a.k.a. confidence) column name.", "'rawPrediction'",
    -         str),
    -        ("inputCol", "input column name.", None, str),
    -        ("inputCols", "input column names.", None, None),
    -        ("outputCol", "output column name.", "self.uid + '__output'", str),
    -        ("numFeatures", "number of features.", None, int),
    +         None),
    +        ("inputCol", "input column name.", None, None),
    +        ("inputCols", "input column names.", None, "TypeConverters.convertToList"),
    +        ("outputCol", "output column name.", "self.uid + '__output'", None),
    +        ("numFeatures", "number of features.", None, "TypeConverters.convertToInt"),
             ("checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). " +
    -         "E.g. 10 means that the cache will get checkpointed every 10 iterations.", None, int),
    -        ("seed", "random seed.", "hash(type(self).__name__)", int),
    -        ("tol", "the convergence tolerance for iterative algorithms.", None, float),
    -        ("stepSize", "Step size to be used for each iteration of optimization.", None, float),
    +         "E.g. 10 means that the cache will get checkpointed every 10 iterations.", None,
    +         "TypeConverters.convertToInt"),
    +        ("seed", "random seed.", "hash(type(self).__name__)", "TypeConverters.convertToInt"),
    +        ("tol", "the convergence tolerance for iterative algorithms.", None,
    +         "TypeConverters.convertToFloat"),
    +        ("stepSize", "Step size to be used for each iteration of optimization.", None,
    +         "TypeConverters.convertToFloat"),
             ("handleInvalid", "how to handle invalid entries. Options are skip (which will filter " +
              "out rows with bad values), or error (which will throw an errror). More options may be " +
    -         "added later.", None, str),
    +         "added later.", None, None),
             ("elasticNetParam", "the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, " +
    -         "the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.", "0.0", float),
    -        ("fitIntercept", "whether to fit an intercept term.", "True", bool),
    +         "the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.", "0.0",
    +         "TypeConverters.convertToFloat"),
    +        ("fitIntercept", "whether to fit an intercept term.", "True", None),
    --- End diff --
    
    Added a `toBoolean` converter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56548536
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: int(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def convertToVector(value):
    +        """
    +        Convert a value to a MLlib Vector, if possible.
    +        """
    +        if isinstance(value, Vector):
    +            return value
    +        elif TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = DenseVector(value)
    +        else:
    +            raise TypeError("Could not convert %s to vector" % value)
    +        return value
    +
    +    @staticmethod
    +    def convertToFloat(value):
    +        """
    +        Convert a value to a float, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    +            return float(value)
    +        else:
    +            raise TypeError("Could not convert %s to float" % value)
    +
    +    @staticmethod
    +    def convertToInt(value):
    +        """
    +        Convert a value to an int, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200066684
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200421289
  
    **[Test build #53943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53943/consoleFull)** for PR 11663 at commit [`3e55f83`](https://github.com/apache/spark/commit/3e55f8330b0d9c8b6ecd7174b911c69191e4f9bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200027327
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53816/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406681
  
    --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py ---
    @@ -105,64 +104,71 @@ def get$Name(self):
     if __name__ == "__main__":
         print(header)
         print("\n# DO NOT MODIFY THIS FILE! It was generated by _shared_params_code_gen.py.\n")
    -    print("from pyspark.ml.param import Param, Params\n\n")
    +    print("from pyspark.ml.param import *\n\n")
         shared = [
    -        ("maxIter", "max number of iterations (>= 0).", None, int),
    -        ("regParam", "regularization parameter (>= 0).", None, float),
    -        ("featuresCol", "features column name.", "'features'", str),
    -        ("labelCol", "label column name.", "'label'", str),
    -        ("predictionCol", "prediction column name.", "'prediction'", str),
    +        ("maxIter", "max number of iterations (>= 0).", None, "TypeConverters.convertToInt"),
    +        ("regParam", "regularization parameter (>= 0).", None, "TypeConverters.convertToFloat"),
    +        ("featuresCol", "features column name.", "'features'", None),
    +        ("labelCol", "label column name.", "'label'", None),
    +        ("predictionCol", "prediction column name.", "'prediction'", None),
             ("probabilityCol", "Column name for predicted class conditional probabilities. " +
              "Note: Not all models output well-calibrated probability estimates! These probabilities " +
    -         "should be treated as confidences, not precise probabilities.", "'probability'", str),
    +         "should be treated as confidences, not precise probabilities.", "'probability'", None),
             ("rawPredictionCol", "raw prediction (a.k.a. confidence) column name.", "'rawPrediction'",
    -         str),
    -        ("inputCol", "input column name.", None, str),
    -        ("inputCols", "input column names.", None, None),
    -        ("outputCol", "output column name.", "self.uid + '__output'", str),
    -        ("numFeatures", "number of features.", None, int),
    +         None),
    +        ("inputCol", "input column name.", None, None),
    +        ("inputCols", "input column names.", None, "TypeConverters.convertToList"),
    +        ("outputCol", "output column name.", "self.uid + '__output'", None),
    +        ("numFeatures", "number of features.", None, "TypeConverters.convertToInt"),
             ("checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint (-1). " +
    -         "E.g. 10 means that the cache will get checkpointed every 10 iterations.", None, int),
    -        ("seed", "random seed.", "hash(type(self).__name__)", int),
    -        ("tol", "the convergence tolerance for iterative algorithms.", None, float),
    -        ("stepSize", "Step size to be used for each iteration of optimization.", None, float),
    +         "E.g. 10 means that the cache will get checkpointed every 10 iterations.", None,
    +         "TypeConverters.convertToInt"),
    +        ("seed", "random seed.", "hash(type(self).__name__)", "TypeConverters.convertToInt"),
    +        ("tol", "the convergence tolerance for iterative algorithms.", None,
    +         "TypeConverters.convertToFloat"),
    +        ("stepSize", "Step size to be used for each iteration of optimization.", None,
    +         "TypeConverters.convertToFloat"),
             ("handleInvalid", "how to handle invalid entries. Options are skip (which will filter " +
              "out rows with bad values), or error (which will throw an errror). More options may be " +
    -         "added later.", None, str),
    +         "added later.", None, None),
             ("elasticNetParam", "the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, " +
    -         "the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.", "0.0", float),
    -        ("fitIntercept", "whether to fit an intercept term.", "True", bool),
    +         "the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.", "0.0",
    +         "TypeConverters.convertToFloat"),
    +        ("fitIntercept", "whether to fit an intercept term.", "True", None),
    --- End diff --
    
    Handle Boolean too?  For Boolean, I'd say only accept ```bool``` values (not 0/1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57189567
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +80,144 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        return TypeConverters._is_numeric(value) and float(value).is_integer()
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype in [list, np.ndarray, tuple, xrange, array.array] or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _can_convert_to_string(value):
    +        vtype = type(value)
    +        is_string = isinstance(value, basestring) or vtype in [np.unicode_, np.string_, np.str_]
    +        return is_string and all(ord(c) < 128 for c in value)  # safe unicode to str
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) in [np.ndarray, tuple, xrange, array.array]:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return [float(v) for v in value]
    +        raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def toListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_integer(v), value)):
    +                return [int(v) for v in value]
    +        raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def toListString(value):
    +        """
    +        Convert a value to list of strings, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._can_convert_to_string(v), value)):
    +                return [str(v) for v in value]
    +        raise TypeError("Could not convert %s to list of strings" % value)
    +
    +    @staticmethod
    +    def toVector(value):
    +        """
    +        Convert a value to a MLlib Vector, if possible.
    +        """
    +        if isinstance(value, Vector):
    +            return value
    +        elif TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return DenseVector(value)
    +        raise TypeError("Could not convert %s to vector" % value)
    +
    +    @staticmethod
    +    def toFloat(value):
    +        """
    +        Convert a value to a float, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    +            return float(value)
    +        else:
    +            raise TypeError("Could not convert %s to float" % value)
    +
    +    @staticmethod
    +    def toInt(value):
    +        """
    +        Convert a value to an int, if possible.
    +        """
    +        if TypeConverters._is_integer(value):
    +            return int(value)
    +        else:
    +            raise TypeError("Could not convert %s to int" % value)
    +
    +    @staticmethod
    +    def toString(value):
    +        """
    +        Convert a value to a string, if possible.
    +        """
    +        if TypeConverters._can_convert_to_string(value):
    +            return str(value)
    +        else:
    +            raise TypeError("Could not convert value of type %s to string" % type(value).__name__)
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-195599201
  
    **[Test build #52960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52960/consoleFull)** for PR 11663 at commit [`40b00dd`](https://github.com/apache/spark/commit/40b00dd2e3209fcf757ec3e3759ce4270096087e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57081986
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -17,28 +17,38 @@
     
     from abc import ABCMeta
     import copy
    +import numpy as np
    +import warnings
     
     from pyspark import since
     from pyspark.ml.util import Identifiable
    +from pyspark.mllib.linalg import DenseVector, Vector, _convert_to_vector
     
     
    -__all__ = ['Param', 'Params']
    +__all__ = ['Param', 'Params', 'TypeConverters']
     
     
     class Param(object):
         """
         A param with self-contained documentation.
     
    +    Note: `expectedType` is deprecated and will be removed in 2.1.0, use typeConverter instead,
    --- End diff --
    
    Pretty much.  But there is still no period


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56548501
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: int(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def convertToVector(value):
    --- End diff --
    
    I haven't changed anything here yet. The issue that I had was that the existing `_convert_to_vector` method doesn't check the contents of the iterable. And so if there is something like a `str` in the iterable, you get a `ValueError` instead of a `TypeError` and the message is that str cannot be converted to float. So, in a addition to not knowing what error will be thrown, this doesn't provide a great error message. I can work around this, but then it ends up being like a custom function anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200024545
  
    **[Test build #53816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53816/consoleFull)** for PR 11663 at commit [`ff7f94d`](https://github.com/apache/spark/commit/ff7f94d249fb54c0b7dc7b48dca5e268ace090ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024870
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -275,23 +425,9 @@ def _set(self, **kwargs):
             """
             for param, value in kwargs.items():
                 p = getattr(self, param)
    -            if p.expectedType is None or type(value) == p.expectedType or value is None:
    -                self._paramMap[getattr(self, param)] = value
    -            else:
    -                try:
    -                    # Try and do "safe" conversions that don't lose information
    -                    if p.expectedType == float:
    -                        self._paramMap[getattr(self, param)] = float(value)
    -                    # Python 3 unified long & int
    -                    elif p.expectedType == int and type(value).__name__ == 'long':
    -                        self._paramMap[getattr(self, param)] = value
    -                    else:
    -                        raise Exception(
    -                            "Provided type {0} incompatible with type {1} for param {2}"
    -                            .format(type(value), p.expectedType, p))
    -                except ValueError:
    -                    raise Exception(("Failed to convert {0} to type {1} for param {2}"
    -                                     .format(type(value), p.expectedType, p)))
    +            if value is not None:
    +                value = p.typeConverter(value)
    --- End diff --
    
    It would be nice to catch the error and augment it with a nicer message saying which Param value was invalid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024806
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -17,28 +17,38 @@
     
     from abc import ABCMeta
     import copy
    +import numpy as np
    +import warnings
     
     from pyspark import since
     from pyspark.ml.util import Identifiable
    +from pyspark.mllib.linalg import DenseVector, Vector, _convert_to_vector
     
     
    -__all__ = ['Param', 'Params']
    +__all__ = ['Param', 'Params', 'TypeConverters']
     
     
     class Param(object):
         """
         A param with self-contained documentation.
     
    +    Note: `expectedType` is deprecated and will be removed in 2.1.0, use typeConverter instead,
    --- End diff --
    
    Make language clearer: "... removed in 2.1.  Use typeConverter ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024815
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    --- End diff --
    
    Simplify:
    ```
    TypeConverters._is_numeric(value) and float(value).is_integer()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57189492
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +80,144 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        return TypeConverters._is_numeric(value) and float(value).is_integer()
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype in [list, np.ndarray, tuple, xrange, array.array] or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _can_convert_to_string(value):
    --- End diff --
    
    I changed the string conversions to handle unicode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406626
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    --- End diff --
    
    Naming: How about everything be called ```toX``` for type ```X```?  It's clear it's for converting b/c of the class name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57083431
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -17,28 +17,38 @@
     
     from abc import ABCMeta
     import copy
    +import numpy as np
    +import warnings
     
     from pyspark import since
     from pyspark.ml.util import Identifiable
    +from pyspark.mllib.linalg import DenseVector, Vector, _convert_to_vector
     
     
    -__all__ = ['Param', 'Params']
    +__all__ = ['Param', 'Params', 'TypeConverters']
     
     
     class Param(object):
         """
         A param with self-contained documentation.
     
    +    Note: `expectedType` is deprecated and will be removed in 2.1.0, use typeConverter instead,
    --- End diff --
    
    I changed it in one place and not another, sorry about that!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57068396
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    +        vtype = type(value)
    +        return vtype in [str, np.unicode_, np.string_, np.str_] or type(value).__name__ == 'unicode'
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return list(map(lambda v: float(v), value))
    --- End diff --
    
    This was necessary because python3 map function returns a "map object" and not a list. But I just changed it to use a list comprehension instead, as it's less confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024837
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _is_string(value):
    --- End diff --
    
    I'd follow existing conventions.  Look at, e.g., the beginning of file feature.py:
    ```
    import sys
    if sys.version > '3':
        basestring = str
    ```
    Then you can use ```isinstance(blah, basestring)```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196399772
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56547314
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    --- End diff --
    
    Changed all methods to `toX` instead of `convertToX`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-197997442
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57068522
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -275,23 +425,9 @@ def _set(self, **kwargs):
             """
             for param, value in kwargs.items():
                 p = getattr(self, param)
    -            if p.expectedType is None or type(value) == p.expectedType or value is None:
    -                self._paramMap[getattr(self, param)] = value
    -            else:
    -                try:
    -                    # Try and do "safe" conversions that don't lose information
    -                    if p.expectedType == float:
    -                        self._paramMap[getattr(self, param)] = float(value)
    -                    # Python 3 unified long & int
    -                    elif p.expectedType == int and type(value).__name__ == 'long':
    -                        self._paramMap[getattr(self, param)] = value
    -                    else:
    -                        raise Exception(
    -                            "Provided type {0} incompatible with type {1} for param {2}"
    -                            .format(type(value), p.expectedType, p))
    -                except ValueError:
    -                    raise Exception(("Failed to convert {0} to type {1} for param {2}"
    -                                     .format(type(value), p.expectedType, p)))
    +            if value is not None:
    +                value = p.typeConverter(value)
    --- End diff --
    
    Added a prefix to the error message to specify the param that was invalid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024794
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2502,7 +2524,7 @@ def __init__(self, numTopFeatures=50, featuresCol="features", outputCol=None, la
                 Param(self, "numTopFeatures",
    --- End diff --
    
    This whole line could be removed.  This must be a leftover from when we used to have to specify Params twice for Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200052418
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53827/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11663


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56547465
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    --- End diff --
    
    I added a unit test for the `toList` conversion function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196383174
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200052411
  
    **[Test build #53827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53827/consoleFull)** for PR 11663 at commit [`99eed51`](https://github.com/apache/spark/commit/99eed51519b690e422d1cc2aec3fcef95c0c3a9d).
     * This patch **fails R style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-195603467
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56547567
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    --- End diff --
    
    Done. Added SparseVectors to unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-198012665
  
    **[Test build #53446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53446/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `    probabilityCol = Param(Params._dummy(), \"probabilityCol\", \"Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.\", typeConverter=TypeConverters.toString)`
      * `    thresholds = Param(Params._dummy(), \"thresholds\", \"Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.\", typeConverter=TypeConverters.toListFloat)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200427992
  
    **[Test build #53943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53943/consoleFull)** for PR 11663 at commit [`3e55f83`](https://github.com/apache/spark/commit/3e55f8330b0d9c8b6ecd7174b911c69191e4f9bb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-197994412
  
    **[Test build #53438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53438/consoleFull)** for PR 11663 at commit [`c483da8`](https://github.com/apache/spark/commit/c483da8a6b1c305a96ce2a4a25ac8a0a1f98e653).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-199906155
  
    I made another pass.  I only had minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57024800
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -17,28 +17,38 @@
     
     from abc import ABCMeta
     import copy
    +import numpy as np
    +import warnings
     
     from pyspark import since
     from pyspark.ml.util import Identifiable
    +from pyspark.mllib.linalg import DenseVector, Vector, _convert_to_vector
    --- End diff --
    
    no longer need _convert_to_vector


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57084286
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +80,144 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        return TypeConverters._is_numeric(value) and float(value).is_integer()
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype in [list, np.ndarray, tuple, xrange, array.array] or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def _can_convert_to_string(value):
    +        vtype = type(value)
    +        is_string = isinstance(value, basestring) or vtype in [np.unicode_, np.string_, np.str_]
    +        return is_string and all(ord(c) < 128 for c in value)  # safe unicode to str
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def toList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) in [np.ndarray, tuple, xrange, array.array]:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return list(value.toArray())
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def toListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return [float(v) for v in value]
    +        raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def toListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_integer(v), value)):
    +                return [int(v) for v in value]
    +        raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def toListString(value):
    +        """
    +        Convert a value to list of strings, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._can_convert_to_string(v), value)):
    +                return [str(v) for v in value]
    +        raise TypeError("Could not convert %s to list of strings" % value)
    +
    +    @staticmethod
    +    def toVector(value):
    +        """
    +        Convert a value to a MLlib Vector, if possible.
    +        """
    +        if isinstance(value, Vector):
    +            return value
    +        elif TypeConverters._can_convert_to_list(value):
    +            value = TypeConverters.toList(value)
    +            if all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +                return DenseVector(value)
    +        raise TypeError("Could not convert %s to vector" % value)
    +
    +    @staticmethod
    +    def toFloat(value):
    +        """
    +        Convert a value to a float, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    +            return float(value)
    +        else:
    +            raise TypeError("Could not convert %s to float" % value)
    +
    +    @staticmethod
    +    def toInt(value):
    +        """
    +        Convert a value to an int, if possible.
    +        """
    +        if TypeConverters._is_integer(value):
    +            return int(value)
    +        else:
    +            raise TypeError("Could not convert %s to int" % value)
    +
    +    @staticmethod
    +    def toString(value):
    +        """
    +        Convert a value to a string, if possible.
    +        """
    +        if TypeConverters._can_convert_to_string(value):
    +            return str(value)
    +        else:
    +            raise TypeError("Could not convert value of type %s to string" % type(value).__name__)
    --- End diff --
    
    I actually like not having ```__name__``` since it's nice to see the module name as well.  I guess you could write ```Could not convert %s to string``` to avoid having "type" appear twice.
    
    Same for other uses of ```__name__```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57068235
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +75,146 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype in [int, float, np.float64, np.int64] or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _is_integer(value):
    +        if TypeConverters._is_numeric(value):
    +            value = float(value)
    +            return value.is_integer()
    +        else:
    +            return False
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    --- End diff --
    
    Added support for these and added them to unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200040354
  
    **[Test build #53821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53821/consoleFull)** for PR 11663 at commit [`e1e4643`](https://github.com/apache/spark/commit/e1e46433582b0740be3ed2977e2da86e8414f6af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406675
  
    --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py ---
    @@ -105,64 +104,71 @@ def get$Name(self):
     if __name__ == "__main__":
         print(header)
         print("\n# DO NOT MODIFY THIS FILE! It was generated by _shared_params_code_gen.py.\n")
    -    print("from pyspark.ml.param import Param, Params\n\n")
    +    print("from pyspark.ml.param import *\n\n")
         shared = [
    -        ("maxIter", "max number of iterations (>= 0).", None, int),
    -        ("regParam", "regularization parameter (>= 0).", None, float),
    -        ("featuresCol", "features column name.", "'features'", str),
    -        ("labelCol", "label column name.", "'label'", str),
    -        ("predictionCol", "prediction column name.", "'prediction'", str),
    +        ("maxIter", "max number of iterations (>= 0).", None, "TypeConverters.convertToInt"),
    +        ("regParam", "regularization parameter (>= 0).", None, "TypeConverters.convertToFloat"),
    +        ("featuresCol", "features column name.", "'features'", None),
    +        ("labelCol", "label column name.", "'label'", None),
    +        ("predictionCol", "prediction column name.", "'prediction'", None),
             ("probabilityCol", "Column name for predicted class conditional probabilities. " +
              "Note: Not all models output well-calibrated probability estimates! These probabilities " +
    -         "should be treated as confidences, not precise probabilities.", "'probability'", str),
    +         "should be treated as confidences, not precise probabilities.", "'probability'", None),
             ("rawPredictionCol", "raw prediction (a.k.a. confidence) column name.", "'rawPrediction'",
    -         str),
    -        ("inputCol", "input column name.", None, str),
    -        ("inputCols", "input column names.", None, None),
    -        ("outputCol", "output column name.", "self.uid + '__output'", str),
    -        ("numFeatures", "number of features.", None, int),
    +         None),
    +        ("inputCol", "input column name.", None, None),
    +        ("inputCols", "input column names.", None, "TypeConverters.convertToList"),
    --- End diff --
    
    Handle list of str too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196388562
  
    **[Test build #53077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53077/consoleFull)** for PR 11663 at commit [`40b00dd`](https://github.com/apache/spark/commit/40b00dd2e3209fcf757ec3e3759ce4270096087e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by sethah <gi...@git.apache.org>.

Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r57068173
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -17,28 +17,38 @@
     
     from abc import ABCMeta
     import copy
    +import numpy as np
    +import warnings
     
     from pyspark import since
     from pyspark.ml.util import Identifiable
    +from pyspark.mllib.linalg import DenseVector, Vector, _convert_to_vector
     
     
    -__all__ = ['Param', 'Params']
    +__all__ = ['Param', 'Params', 'TypeConverters']
     
     
     class Param(object):
         """
         A param with self-contained documentation.
     
    +    Note: `expectedType` is deprecated and will be removed in 2.1.0, use typeConverter instead,
    --- End diff --
    
    I basically just added a period. I think that's all you were suggesting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-196399588
  
    **[Test build #53077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53077/consoleFull)** for PR 11663 at commit [`40b00dd`](https://github.com/apache/spark/commit/40b00dd2e3209fcf757ec3e3759ce4270096087e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406671
  
    --- Diff: python/pyspark/ml/param/_shared_params_code_gen.py ---
    @@ -105,64 +104,71 @@ def get$Name(self):
     if __name__ == "__main__":
         print(header)
         print("\n# DO NOT MODIFY THIS FILE! It was generated by _shared_params_code_gen.py.\n")
    -    print("from pyspark.ml.param import Param, Params\n\n")
    +    print("from pyspark.ml.param import *\n\n")
         shared = [
    -        ("maxIter", "max number of iterations (>= 0).", None, int),
    -        ("regParam", "regularization parameter (>= 0).", None, float),
    -        ("featuresCol", "features column name.", "'features'", str),
    -        ("labelCol", "label column name.", "'label'", str),
    -        ("predictionCol", "prediction column name.", "'prediction'", str),
    +        ("maxIter", "max number of iterations (>= 0).", None, "TypeConverters.convertToInt"),
    +        ("regParam", "regularization parameter (>= 0).", None, "TypeConverters.convertToFloat"),
    +        ("featuresCol", "features column name.", "'features'", None),
    --- End diff --
    
    This removes the check for the value being a string.  How about creating a TypeConverter for str which requires it be an instance of basestring, and then convert to str?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200062615
  
    **[Test build #53832 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53832/consoleFull)** for PR 11663 at commit [`2c46076`](https://github.com/apache/spark/commit/2c4607688a458bfecbf40b604e8cd59a5fa4851b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by jkbradley <gi...@git.apache.org>.

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11663#discussion_r56406655
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -65,6 +72,106 @@ def __eq__(self, other):
                 return False
     
     
    +class TypeConverters(object):
    +    """
    +    .. note:: DeveloperApi
    +
    +    Factory methods for common type conversion functions for `Param.typeConverter`.
    +
    +    .. versionadded:: 2.0.0
    +    """
    +
    +    @staticmethod
    +    def _is_numeric(value):
    +        vtype = type(value)
    +        return vtype == int or vtype == float or vtype == np.float64 \
    +            or vtype == np.int64 or vtype.__name__ == 'long'
    +
    +    @staticmethod
    +    def _can_convert_to_list(value):
    +        vtype = type(value)
    +        return vtype == list or vtype == np.ndarray or isinstance(value, Vector)
    +
    +    @staticmethod
    +    def identity(value):
    +        """
    +        Dummy converter that just returns value.
    +        """
    +        return value
    +
    +    @staticmethod
    +    def convertToList(value):
    +        """
    +        Convert a value to a list, if possible.
    +        """
    +        if type(value) == list:
    +            return value
    +        elif type(value) == np.ndarray:
    +            return list(value)
    +        elif isinstance(value, Vector):
    +            return value.toArray()
    +        else:
    +            raise TypeError("Could not convert %s to list" % value)
    +
    +    @staticmethod
    +    def convertToListFloat(value):
    +        """
    +        Convert a value to list of floats, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: float(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of floats" % value)
    +
    +    @staticmethod
    +    def convertToListInt(value):
    +        """
    +        Convert a value to list of ints, if possible.
    +        """
    +        if TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = TypeConverters.convertToList(value)
    +            return list(map(lambda v: int(v), value))
    +        else:
    +            raise TypeError("Could not convert %s to list of ints" % value)
    +
    +    @staticmethod
    +    def convertToVector(value):
    +        """
    +        Convert a value to a MLlib Vector, if possible.
    +        """
    +        if isinstance(value, Vector):
    +            return value
    +        elif TypeConverters._can_convert_to_list(value) and \
    +                all(map(lambda v: TypeConverters._is_numeric(v), value)):
    +            value = DenseVector(value)
    +        else:
    +            raise TypeError("Could not convert %s to vector" % value)
    +        return value
    +
    +    @staticmethod
    +    def convertToFloat(value):
    +        """
    +        Convert a value to a float, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    +            return float(value)
    +        else:
    +            raise TypeError("Could not convert %s to float" % value)
    +
    +    @staticmethod
    +    def convertToInt(value):
    +        """
    +        Convert a value to an int, if possible.
    +        """
    +        if TypeConverters._is_numeric(value):
    --- End diff --
    
    check for integer value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-200051224
  
    **[Test build #53827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53827/consoleFull)** for PR 11663 at commit [`99eed51`](https://github.com/apache/spark/commit/99eed51519b690e422d1cc2aec3fcef95c0c3a9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11663#issuecomment-197997448
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53438/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org