You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hhbyyh <gi...@git.apache.org> on 2017/10/28 19:32:29 UTC

[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/19599

    [SPARK-22381] [ML] Add StringParam that supports valid options

    ## What changes were proposed in this pull request?
    
    jira: https://issues.apache.org/jira/browse/SPARK-22381
    
    During test with https://issues.apache.org/jira/browse/SPARK-22331, I found it might be a good idea to include the possible options in a StringParam.
    A `StringParam` extends `Param[String]` and allow user to specify the valid options in Array[String] (case insensitive).
    
    So far it can help achieve three goals:
    
    1.  Make the StringParam aware of its possible options and support native validations.
    2.  StringParam can list the supported options in exception message when user input wrong value.
         before: `$parent parameter $name given invalid value $value.`
         after: `$parent parameter $name given invalid value $value. Supported options: auto, normal, l-bfgs`
        
    3.  Allow automatic unit test coverage for case-insensitive String param. (built into checkParam)
    
    and IMO it also decrease the code redundancy and make it more maintainable.
    
    The StringParam is designed to be completely compatible with existing Param[String], just with the extra logic for supporting options, which means we don't need to convert all Param[String] to StringParam until we feel comfortable to do that.
    
    The PR contains the minimum changes required and use NaiveBayes and LinearRegression as examples. They're intentionally built with some difference to demo different options when comparing String Params.
    
    The PR intentionally leave SharedParamsCodeGen untouched for now, which can be easily added if StringParam sounds good. Also more unit tests can be added.
    
    ## How was this patch tested?
    
    Strengthen existing unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark stringparam

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19599
    
----
commit f6b918eda3448df054edd81a077fa1a3d996d5c1
Author: Yuhao Yang <yu...@intel.com>
Date:   2017-10-28T18:44:46Z

    add StringParam

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #83199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83199/testReport)** for PR 19599 at commit [`01e7d3d`](https://github.com/apache/spark/commit/01e7d3d5f9b0ae278ebce60635e5c2568d3d0cf3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Please advice if this is a good feature to add. If not I'll close it. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156555847
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    --- End diff --
    
    I would gladly make it a val. But met some problem for assigning the value in the constructor. `this.options = Some(options)`, any advice?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156605360
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    --- End diff --
    
    What about this?
    ```
    class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean, 
                      options: Option[Array[String]] = None)
      extends Param[String](parent, name, doc, isValid) {
    ...
      def this(parent: Params, name: String, doc: String, options: Array[String]) = {
        this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
          s => options.exists(s.equalsIgnoreCase), Some(options))
      }```
    
    This solves the options as val problem, but highlights another one: 
    why do we need the possibility to give an explicit isValid? Why not always expect options only?
    
    I agree with @attilapiros that these params are enum-like. If so the only reasonable validation is to check if one of the acceptable values are given (ignoring case). I don't remember ever seeing a custom validator doing anything else. Removing these custom validators would decrease complexity and code duplication.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #84698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84698/testReport)** for PR 19599 at commit [`3f6e472`](https://github.com/apache/spark/commit/3f6e472ebff8327e20dcdb0041f525a3856244b0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156339663
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    --- End diff --
    
    Is this tested somewhere?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #84699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84699/testReport)** for PR 19599 at commit [`f55b22d`](https://github.com/apache/spark/commit/f55b22dacef04b3d3e6f0cb05acad3e9997c133d).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84699/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156556262
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
           elasticNetParam, fitIntercept, maxIter, regParam, standardization, aggregationDepth)
         instr.logNumFeatures(numFeatures)
     
    -    if (($(solver) == Auto &&
    -      numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) == Normal) {
    +    if (($(solver).equalsIgnoreCase(Auto) && numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES)
    --- End diff --
    
    I'll reply this in a comment. Thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r147562530
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
      * Specialized version of `Param[Array[String]]` for Java.
    --- End diff --
    
    Eh, this need to be updated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156609625
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    +      s => options.exists(s.equalsIgnoreCase))
    +    this.options = Some(options)
    +  }
    +
    +  private[spark] def getOptions: Option[Array[String]] = options
    +
    +  /** Creates a param pair with given value (for Java). */
    +  override def w(value: String): ParamPair[String] = super.w(value)
    +
    +  override def validate(value: String): Unit = {
    --- End diff --
    
    should be private[param] 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by attilapiros <gi...@git.apache.org>.
Github user attilapiros commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    As I see inside the StringParam the options even can be mapped to enum values. This way ignoring cases at the matching part would be no problem at all (and it would be very efficient). Going further this could be a type safe construct by making StringParam a generic where the generic parameter is the enum type used as the target of the mapping: like StringParam[NaiveBayesModelType].
    What is your opinion? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #84887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84887/testReport)** for PR 19599 at commit [`b79d8db`](https://github.com/apache/spark/commit/b79d8db9406fbd29ef46c8a74f8591d2aace45ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Updated, use $lc and add a new unit test for doc and exception.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r155782239
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala ---
    @@ -139,6 +139,17 @@ class ParamsSuite extends SparkFunSuite {
           }
         }
     
    +    { // StringArrayParam
    --- End diff --
    
    It should be StringParam


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    I used two ways to switch String params among different options:
    1. In NaiveBayes: convert StringParam and String constants to lowercase.
    2. in LinearRegression: .equalsIgnoreCase
    
    Currently most Spark algorithms use 1. 
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156771298
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    +      s => options.exists(s.equalsIgnoreCase))
    +    this.options = Some(options)
    +  }
    +
    +  private[spark] def getOptions: Option[Array[String]] = options
    +
    +  /** Creates a param pair with given value (for Java). */
    +  override def w(value: String): ParamPair[String] = super.w(value)
    +
    +  override def validate(value: String): Unit = {
    --- End diff --
    
    Thanks, that's the reason why doc generation failed. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r147562645
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
           elasticNetParam, fitIntercept, maxIter, regParam, standardization, aggregationDepth)
         instr.logNumFeatures(numFeatures)
     
    -    if (($(solver) == Auto &&
    -      numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) == Normal) {
    +    if (($(solver).equalsIgnoreCase(Auto) && numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES)
    +      || $(solver).equalsIgnoreCase(Normal)) {
    --- End diff --
    
    Option 2: Compare StringParam and candidates using `equalsIgnoreCase`. This may be harder for match cases.
    str match {
      case s if str.equalsIgnoreCase("HELLO") =>
    }


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #83199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83199/testReport)** for PR 19599 at commit [`01e7d3d`](https://github.com/apache/spark/commit/01e7d3d5f9b0ae278ebce60635e5c2568d3d0cf3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156772807
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    --- End diff --
    
    I see. We have embedded the test into `checkParams` but that only checks the legal cases. 
    I'll add a unit test for illegal cases, and we can check the doc and exception message.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #84698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84698/testReport)** for PR 19599 at commit [`3f6e472`](https://github.com/apache/spark/commit/3f6e472ebff8327e20dcdb0041f525a3856244b0).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156339373
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
    @@ -224,8 +222,8 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
           elasticNetParam, fitIntercept, maxIter, regParam, standardization, aggregationDepth)
         instr.logNumFeatures(numFeatures)
     
    -    if (($(solver) == Auto &&
    -      numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) == Normal) {
    +    if (($(solver).equalsIgnoreCase(Auto) && numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES)
    --- End diff --
    
    I think using equalsIgnoreCase all over the places is problematic: you need to check and potentially modify code at many places in the code. Everywhere when $() is used on a StringParam you may need this change.  
    It's pretty easy to miss some and that would lead to subtle bugs.
    
    A safer approach would be to work with lowercase param values everywhere.
    You could convert to lowercase in the constructor and require that possible param values are given as lowercase. This way all comparisons, pattern matches, etc. would work just fine. 
    The downside of this approach would be that everywhere where you show these values to the users you would present the lowercase value. It might cause issues e.g. if you log it and some external log parser expects the original (non-lowercase) version. In the very broad sense, it would break compatibility but I guess that would be acceptable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83199/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84887/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156556136
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    --- End diff --
    
    It's currently not. Do you mean we should test the `doc` correctly include all the options? or there's any special case that should be covered.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156086625
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    --- End diff --
    
    It should rather be a val. That way you would not need def getOptions()


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by attilapiros <gi...@git.apache.org>.
Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156913602
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    --- End diff --
    
    I see your point regarding a use case to support "free text" strings among to support strings from a fixed set of options. But to support these two different use cases I would go for to have two different string params. One for enum like strings and one for free text strings this way the type safety is still at our side. Like EnumStringParam and FreeTextStringParam or any other better name we come up with.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Many thanks for the review @smurakozi and @attilapiros.
    
    > The PR is not complete (did not convert all Param[String] instances to StringParam consistently) so it should be marked as WIP.
    
    StringParam is compatible with existing Param[String] so we don't need to update all the Param[String] together. Usually I would wait for some time to confirm the change will not bring any potential issue. After that, we can try to apply it to a wider range. 
    And I really don't want to update all the instances of Param[String] before we agree what's the best practice for some implementation details.
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by smurakozi <gi...@git.apache.org>.
Github user smurakozi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156609525
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    --- End diff --
    
    I missed one additional line when highlighting, sorry for that :)
    I meant to test the case-insensitive validation in the next line.
    
    Doc could be tested too, I think it's rather a nice to have than a must.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #84699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84699/testReport)** for PR 19599 at commit [`f55b22d`](https://github.com/apache/spark/commit/f55b22dacef04b3d3e6f0cb05acad3e9997c133d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #84887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84887/testReport)** for PR 19599 at commit [`b79d8db`](https://github.com/apache/spark/commit/b79d8db9406fbd29ef46c8a74f8591d2aace45ee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #83174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83174/testReport)** for PR 19599 at commit [`f6b918e`](https://github.com/apache/spark/commit/f6b918eda3448df054edd81a077fa1a3d996d5c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r156770135
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -435,6 +435,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
       }
     }
     
    +/**
    + * :: DeveloperApi ::
    + * Specialized version of `Param[String]` for Java.
    + */
    +@DeveloperApi
    +class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    --- End diff --
    
    Besides enum-like String param, I think it's possible to have other kinds of String param in the future, like initialModelPath or modelName. So I'd like to keep the flexibility that String Param is able to support both limited options and custom validation.
    
    And IMO making the `options` private already provides the required isolation even it's a `var`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    > One option that came to my mind was that $ returns lowercase, so this is used at most places but when you really need it you can access the original (not necessarily lowercase) value.
    
    As in the previous commits of this PR, I tried to add a new `$lc` function in trait `Params`, which works with Param[String] and return the lowercased version. But that breaks the binary compatibility and I guess that's  the bigger evil. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r147562600
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala ---
    @@ -440,6 +440,43 @@ class BooleanParam(parent: String, name: String, doc: String) // No need for isV
      * Specialized version of `Param[Array[String]]` for Java.
      */
     @DeveloperApi
    +private[ml] class StringParam(parent: Params, name: String, doc: String, isValid: String => Boolean)
    +  extends Param[String](parent, name, doc, isValid) {
    +
    +  private var options: Option[Array[String]] = None
    +
    +  def this(parent: Params, name: String, doc: String) =
    +    this(parent, name, doc, ParamValidators.alwaysTrue)
    +
    +  /** construct a StringParam with limited options (case-insensitive) */
    +  def this(parent: Params, name: String, doc: String, options: Array[String]) = {
    +    this(parent, name, doc + s" Supported options (case-insensitive): ${options.mkString(", ")}.",
    +      s => options.exists(s.equalsIgnoreCase))
    --- End diff --
    
    Though `options.exists(s.equalsIgnoreCase))` and 
    `options.map(_.toLowerCase(Locale.ROOT)).contains(s.toLowerCase(Locale.ROOT))` should be essentially the same. IMO we still need to stick to just one of them thoroughly, which also depends on how to do param comparison (switch) in Estimators.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    **[Test build #83174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83174/testReport)** for PR 19599 at commit [`f6b918e`](https://github.com/apache/spark/commit/f6b918eda3448df054edd81a077fa1a3d996d5c1).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83174/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19599: [SPARK-22381] [ML] Add StringParam that supports ...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19599#discussion_r147562823
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala ---
    @@ -133,7 +134,7 @@ class NaiveBayes @Since("1.5.0") (
             s" numClasses=$numClasses, but thresholds has length ${$(thresholds).length}")
         }
     
    -    val modelTypeValue = $(modelType)
    +    val modelTypeValue = $lc(modelType)
    --- End diff --
    
    option 1: Compare StringParam and candidates after converting them to lower case. This requires that all the String Constants like `NaiveBayes.Multinomial` to be lower case.
    
    I don't like writing xxx.toLowerCase(Locale.ROOT) everywhere, so I created the $lc in Params to centralize it.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19599: [SPARK-22381] [ML] Add StringParam that supports valid o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19599
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84698/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org