You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dongjinleekr <gi...@git.apache.org> on 2018/06/06 07:45:47 UTC

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

GitHub user dongjinleekr opened a pull request:

    https://github.com/apache/spark/pull/21501

    [SPARK-15064][ML] Locale support in StopWordsRemover

    ## What changes were proposed in this pull request?
    
    Add locale support for `StopWordsRemover`.
    
    ## How was this patch tested?
    
    [Scala|Python] unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjinleekr/spark feature/SPARK-15064

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21501
    
----
commit 8379d88360651e0571698412e4aecc305ae7fe60
Author: Lee Dongjin <do...@...>
Date:   2018-03-21T15:03:17Z

    Implement SPARK-15064

commit 682df4afe6859119d01de1fa953f95ebaacb4889
Author: Lee Dongjin <do...@...>
Date:   2018-06-05T11:00:17Z

    Add test cases for StopWordsRemoverSuite

commit 67ae6ebaebbb537f24db51be1f96759d5c6463ff
Author: Lee Dongjin <do...@...>
Date:   2018-06-06T07:43:45Z

    Implement pyspark parity

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194298446
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive
             Sets params for this StopWordRemover.
             """
             kwargs = self._input_kwargs
    +        if locale is None:
    +            sc = SparkContext._active_spark_context
    +            kwargs['locale'] = sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale()
    --- End diff --
    
    I was meant that if `sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale()` returns the same every time, shall we cache it and reuse it? But after re-thinking it, maybe we won't call this method with `locale is None` frequently. So it maybe fine.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194341627
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    --- End diff --
    
    You meant `getDefaultLocale` here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91648/testReport)** for PR 21501 at commit [`b4249c3`](https://github.com/apache/spark/commit/b4249c342a92dc840a1f1d5290c24a5fe165417d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193963482
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
     
         @keyword_only
    -    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False,
    +                 locale=None):
             """
    -        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false)
    +        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false,
    +        locale=None)
             """
             super(StopWordsRemover, self).__init__()
             self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
                                                 self.uid)
             self._setDefault(stopWords=StopWordsRemover.loadDefaultStopWords("english"),
    -                         caseSensitive=False)
    +                         caseSensitive=False, locale=None)
    --- End diff --
    
    If we set default as None, will `self.getOrDefault(self.locale)` below just return `None` instead of the default value in Scala side?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194623958
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
    --- End diff --
    
    Thank you for the comment but... is that necessary?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91553/testReport)** for PR 21501 at commit [`fc0fe20`](https://github.com/apache/spark/commit/fc0fe20a37e2e40a5675c972352ca3dcc259b32d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91595/testReport)** for PR 21501 at commit [`6983c91`](https://github.com/apache/spark/commit/6983c9167b2e48673a59f1c50d9a8e1a1d6381ca).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194293510
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive
             Sets params for this StopWordRemover.
             """
             kwargs = self._input_kwargs
    +        if locale is None:
    +            sc = SparkContext._active_spark_context
    +            kwargs['locale'] = sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale()
    --- End diff --
    
    We can keep this default local, and use it many times instead of call to JVM form Python every time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193493040
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    --- End diff --
    
    Locale of the input for case insensitive matching.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194296251
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2610,6 +2610,9 @@ def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive
             Sets params for this StopWordRemover.
             """
             kwargs = self._input_kwargs
    +        if locale is None:
    +            sc = SparkContext._active_spark_context
    +            kwargs['locale'] = sc._gateway.jvm.org.spark.ml.util.LocaleUtils.getDefaultLocale()
    --- End diff --
    
    @viirya You mean... `locale=SparkContext._active_spark_context.(...)`over `locale=None` with ugly if statement, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194244754
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    --- End diff --
    
    Locale is usually configured at system level. Setting the default locale to `en` is a breaking change to users who set system locale to `tr` and have the following code:
    
    ~~~
    val remover = new StopWordsRemover()
      .setStopWords(loadDefaultStopWords("turkish")
    ~~~


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91689/testReport)** for PR 21501 at commit [`bbd167b`](https://github.com/apache/spark/commit/bbd167b79a073f6eca67b57012d936d692f7d7c8).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91657 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91657/testReport)** for PR 21501 at commit [`675d7aa`](https://github.com/apache/spark/commit/675d7aa955cc54fbbce79ccb8294ef81afcd3fb5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194606981
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
     
         @keyword_only
    -    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False,
    +                 locale=None):
             """
    -        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false)
    +        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false,
    +        locale=None)
             """
             super(StopWordsRemover, self).__init__()
             self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
                                                 self.uid)
             self._setDefault(stopWords=StopWordsRemover.loadDefaultStopWords("english"),
    -                         caseSensitive=False)
    +                         caseSensitive=False, locale=StopWordsRemover.defaultLocale())
             kwargs = self._input_kwargs
             self.setParams(**kwargs)
     
         @keyword_only
         @since("1.6.0")
    -    def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False,
    +                  locale=None):
             """
    -        setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false)
    +        setParams(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false,
    +        locale=None)
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193954052
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -95,8 +121,7 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
             terms.filter(s => !stopWordsSet.contains(s))
           }
         } else {
    -      // TODO: support user locale (SPARK-15064)
    -      val toLower = (s: String) => if (s != null) s.toLowerCase else s
    +      val toLower = (s: String) => if (s != null) s.toLowerCase(new Locale($(locale))) else s
    --- End diff --
    
    Maybe move `new Locale($(locale))` out of `toLower`, so we don't need to create it every time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193494337
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala ---
    @@ -65,6 +67,56 @@ class StopWordsRemoverSuite extends MLTest with DefaultReadWriteTest {
         testStopWordsRemover(remover, dataSet)
       }
     
    +  test("StopWordsRemover with localed input (case insensitive)") {
    +    val stopWords = Array("milk", "cookie")
    +    val remover = new StopWordsRemover()
    +      .setInputCol("raw")
    +      .setOutputCol("filtered")
    +      .setStopWords(stopWords)
    +      .setLocale(new Locale("tr"))
    --- End diff --
    
    Could you leave an inline comment explaining which part is special to tr locale?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Congratulation, @dongjinleekr ! :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193332595
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    +   * Default: Locale.English
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.5.0")
    --- End diff --
    
    It should be 2.4.0 :-).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91693/testReport)** for PR 21501 at commit [`bbd167b`](https://github.com/apache/spark/commit/bbd167b79a073f6eca67b57012d936d692f7d7c8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193492802
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    --- End diff --
    
    Ignored when caseSensitive is true?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91595/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91637/testReport)** for PR 21501 at commit [`3636329`](https://github.com/apache/spark/commit/363632981138b9e1fe84b4d31f0cea3bc047e241).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91648/testReport)** for PR 21501 at commit [`b4249c3`](https://github.com/apache/spark/commit/b4249c342a92dc840a1f1d5290c24a5fe165417d).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21501


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91556/testReport)** for PR 21501 at commit [`6983c91`](https://github.com/apache/spark/commit/6983c9167b2e48673a59f1c50d9a8e1a1d6381ca).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194098947
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    --- End diff --
    
    I understand that Locale.getDefault is the current behavior, yet since we're using english as default stopwords, I'm leaning towards using English as default locale.
    Also using Locale.getDefault means that the same code will behave differently on different nodes, which adds extra complexity for trouble-shooting. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194222392
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    --- End diff --
    
    As far as I am understanding, the Locale makes difference how the lowercase string would be, not related to the stopwords. @mengxr It is the reason why you recommended me to change the default locale, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    LGTM. Merged into master. @dongjinleekr Do you have a JIRA user? Could you ping me at https://issues.apache.org/jira/browse/SPARK-15064 so I can assign the ticket to you. Thanks for your contribution and @viirya for review!
    
    Btw, I found there is an issue with python doc generation on my local. setParams doc is generated correctly but class doc still shows `(*args, **kwargs)`. I will create a bug JIRA.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    @mengxr Thank you very much for your kind guidance. If you have some time, please have a look at my another PR, #21525. I found this issue while I was reading [an issue on HashingTF](https://issues.apache.org/jira/browse/SPARK-13998).
    @viirya Thank you again for your comments. Without you I could never complete this feature.
    @dongjoon-hyun 😄


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194099298
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[String] = new Param[String](this, "locale",
    +    "Locale of the input for case insensitive matching. Ignored when caseSensitive is true.",
    +    ParamValidators.inArray[String](Locale.getAvailableLocales.map(_.toString)))
    +
    +  /** @group setParam */
    +  @Since("2.4.0")
    +  def setLocale(value: String): this.type = set(locale, value)
    --- End diff --
    
    how about adding another setter that takes a Locale parameter?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91636/testReport)** for PR 21501 at commit [`7d0631e`](https://github.com/apache/spark/commit/7d0631e5f755f227ae31446db764de91a2f7b74b).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    @dongjinleekr You are welcome and thanks for your contribution!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91635/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91657/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91506/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Finally, I made it work by adding `StopWordsRemover.getDefaultLocale`! How about this approach? @mengxr @viirya


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    jenkins test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91693/testReport)** for PR 21501 at commit [`bbd167b`](https://github.com/apache/spark/commit/bbd167b79a073f6eca67b57012d936d692f7d7c8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194626092
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
    --- End diff --
    
    And also don't forget to mention default value is JVM default locale.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91527/testReport)** for PR 21501 at commit [`dde0e79`](https://github.com/apache/spark/commit/dde0e795e0c3db220d558f2c90ff349119c898a0).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194346755
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala ---
    @@ -65,6 +65,56 @@ class StopWordsRemoverSuite extends MLTest with DefaultReadWriteTest {
         testStopWordsRemover(remover, dataSet)
       }
     
    +  test("StopWordsRemover with localed input (case insensitive)") {
    +    val stopWords = Array("milk", "cookie")
    +    val remover = new StopWordsRemover()
    +      .setInputCol("raw")
    +      .setOutputCol("filtered")
    +      .setStopWords(stopWords)
    +      .setLocale("tr")  // Turkish alphabet: has no Q, W, X but has dotted and dotless 'I's.
    +    val dataSet = Seq(
    +      // scalastyle:off
    +      (Seq("mÄ°lk", "and", "nuts"), Seq("and", "nuts")),
    +      // scalastyle:on
    +      (Seq("cookIe", "and", "nuts"), Seq("cookIe", "and", "nuts")),
    +      (Seq(null), Seq(null)),
    +      (Seq(), Seq())
    +    ).toDF("raw", "expected")
    +
    +    testStopWordsRemover(remover, dataSet)
    +  }
    +
    +  test("StopWordsRemover with localed input (case sensitive)") {
    +    val stopWords = Array("milk", "cookie")
    +    val remover = new StopWordsRemover()
    +      .setInputCol("raw")
    +      .setOutputCol("filtered")
    +      .setStopWords(stopWords)
    +      .setCaseSensitive(true)
    +      .setLocale("tr")  // Turkish alphabet: has no Q, W, X but has dotted and dotless 'I's.
    +    val dataSet = Seq(
    +      // scalastyle:off
    +      (Seq("mÄ°lk", "and", "nuts"), Seq("mÄ°lk", "and", "nuts")),
    +      // scalastyle:on
    +      (Seq("cookIe", "and", "nuts"), Seq("cookIe", "and", "nuts")),
    +      (Seq(null), Seq(null)),
    +      (Seq(), Seq())
    +    ).toDF("raw", "expected")
    +
    +    testStopWordsRemover(remover, dataSet)
    +  }
    +
    +  test("StopWordsRemover with invalid locale") {
    +    intercept[IllegalArgumentException] {
    +      val stopWords = Array("test", "a", "an", "the")
    +      val _ = new StopWordsRemover()
    --- End diff --
    
    Do we need `val _`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194346431
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala ---
    @@ -65,6 +65,56 @@ class StopWordsRemoverSuite extends MLTest with DefaultReadWriteTest {
         testStopWordsRemover(remover, dataSet)
       }
     
    +  test("StopWordsRemover with localed input (case insensitive)") {
    +    val stopWords = Array("milk", "cookie")
    +    val remover = new StopWordsRemover()
    +      .setInputCol("raw")
    +      .setOutputCol("filtered")
    +      .setStopWords(stopWords)
    +      .setLocale("tr")  // Turkish alphabet: has no Q, W, X but has dotted and dotless 'I's.
    --- End diff --
    
    Lets explicitly call  `.setCaseSensitive(false)` here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91527/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194342986
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    --- End diff --
    
    ... OMG; please wait.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91689/testReport)** for PR 21501 at commit [`bbd167b`](https://github.com/apache/spark/commit/bbd167b79a073f6eca67b57012d936d692f7d7c8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194606928
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
     
         @keyword_only
    -    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False,
    +                 locale=None):
             """
    -        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false)
    +        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false,
    +        locale=None)
    --- End diff --
    
    Please add \ to the end of L2592 and use the right indentation here. Unfortunately, we need this to make the doc correctly displayed. See https://github.com/apache/spark/blob/master/python/pyspark/ml/feature.py#L3112.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91551/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    jenkins test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91689/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194343283
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
    --- End diff --
    
    Refer to Java `Locale.getAvailableLocales` for available values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91553/testReport)** for PR 21501 at commit [`fc0fe20`](https://github.com/apache/spark/commit/fc0fe20a37e2e40a5675c972352ca3dcc259b32d).
     * This patch **fails to generate documentation**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91652/testReport)** for PR 21501 at commit [`80d664f`](https://github.com/apache/spark/commit/80d664ffe9d182bae3a71ea5e2bf0e66b04b6d62).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91636/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91506/testReport)** for PR 21501 at commit [`be38fd8`](https://github.com/apache/spark/commit/be38fd82bb711ab21e94b14cff1256c3c48f4c76).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193492568
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    +   * Default: Locale.English
    --- End diff --
    
    The default should be the same as `toLowerCase` default local, which is system default (Locale.getDefault).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91634/testReport)** for PR 21501 at commit [`df7cd11`](https://github.com/apache/spark/commit/df7cd11b686720af4626d6226f88542cc550b754).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91556/testReport)** for PR 21501 at commit [`6983c91`](https://github.com/apache/spark/commit/6983c9167b2e48673a59f1c50d9a8e1a1d6381ca).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91556/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by dongjinleekr <gi...@git.apache.org>.
Github user dongjinleekr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194222407
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[String] = new Param[String](this, "locale",
    +    "Locale of the input for case insensitive matching. Ignored when caseSensitive is true.",
    +    ParamValidators.inArray[String](Locale.getAvailableLocales.map(_.toString)))
    +
    +  /** @group setParam */
    +  @Since("2.4.0")
    +  def setLocale(value: String): this.type = set(locale, value)
    --- End diff --
    
    @mengxr How do you think? I think supporting two setters is reasonable, but it can introduce some asymmetrical signatures.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91635/testReport)** for PR 21501 at commit [`967d3c1`](https://github.com/apache/spark/commit/967d3c10a18b27e01c62d5254d188fb1e74de08d).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91549/testReport)** for PR 21501 at commit [`f572684`](https://github.com/apache/spark/commit/f572684e0df29f522bc0d5b1521e7dc5356bae49).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91652/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193604620
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,27 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive is false",
    +                      typeConverter=TypeConverters.toString)
    --- End diff --
    
    nit: indentation


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91636/testReport)** for PR 21501 at commit [`7d0631e`](https://github.com/apache/spark/commit/7d0631e5f755f227ae31446db764de91a2f7b74b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91549/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194625679
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
    --- End diff --
    
    I'm not sure if users are familiar with available locale setting values here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194244784
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[String] = new Param[String](this, "locale",
    +    "Locale of the input for case insensitive matching. Ignored when caseSensitive is true.",
    +    ParamValidators.inArray[String](Locale.getAvailableLocales.map(_.toString)))
    +
    +  /** @group setParam */
    +  @Since("2.4.0")
    +  def setLocale(value: String): this.type = set(locale, value)
    --- End diff --
    
    Not sure if it is necessary. If users already have a `Locale` instance, they can use `toString` to provide the value. It is simpler if setter/getter have consistent types.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193493884
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    +   * Default: Locale.English
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[Locale] = new Param[Locale](this, "locale",
    +    "Locale of the input. Ignored when caseSensitive is false.")
    +
    +  /**
    +   * @group setParam
    +   * @throws IllegalArgumentException if `value` is not an available locale.
    +   */
    +  @Since("2.4.0")
    +  def setLocale(value: Locale): this.type = {
    +    if (Locale.getAvailableLocales contains value) {
    +      set(locale, value)
    +    } else {
    +      throw new IllegalArgumentException(s"Unsupported locale: $value")
    +    }
    +  }
    +
    +  /** @group getParam */
    +  @Since("2.4.0")
    +  def getLocale: Locale = $(locale)
    +
    +  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"),
    +    caseSensitive -> false, locale -> Locale.ENGLISH)
    --- End diff --
    
    `Locale.getDefault`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91595/testReport)** for PR 21501 at commit [`6983c91`](https://github.com/apache/spark/commit/6983c9167b2e48673a59f1c50d9a8e1a1d6381ca).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193635361
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    +   * Default: Locale.English
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[Locale] = new Param[Locale](this, "locale",
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91648/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194244802
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
     
         @keyword_only
    -    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False,
    +                 locale=None):
             """
    -        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false)
    +        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false,
    +        locale=None)
             """
             super(StopWordsRemover, self).__init__()
             self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
                                                 self.uid)
             self._setDefault(stopWords=StopWordsRemover.loadDefaultStopWords("english"),
    -                         caseSensitive=False)
    +                         caseSensitive=False, locale=None)
    --- End diff --
    
    I think we just need to explain how we handle the default value in the doc.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91527/testReport)** for PR 21501 at commit [`dde0e79`](https://github.com/apache/spark/commit/dde0e795e0c3db220d558f2c90ff349119c898a0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91551/testReport)** for PR 21501 at commit [`f63cb26`](https://github.com/apache/spark/commit/f63cb26ee6242ce7f7eb975e6d4ce10496993526).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193781216
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,27 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive is false",
    +                   typeConverter=TypeConverters.toString)
     
         @keyword_only
    -    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False, locale="en"):
    --- End diff --
    
    We should keep the default consistent. I'm not sure if python's [`locale.getdefaultlocale`](https://docs.python.org/2/library/locale.html#locale.getdefaultlocale) is compatible with Java's. But we can always make the default `None` in Python and leave it to JVM to set the default.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91551/testReport)** for PR 21501 at commit [`f63cb26`](https://github.com/apache/spark/commit/f63cb26ee6242ce7f7eb975e6d4ce10496993526).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193493663
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    +   * Default: Locale.English
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[Locale] = new Param[Locale](this, "locale",
    +    "Locale of the input. Ignored when caseSensitive is false.")
    +
    +  /**
    +   * @group setParam
    +   * @throws IllegalArgumentException if `value` is not an available locale.
    +   */
    +  @Since("2.4.0")
    +  def setLocale(value: Locale): this.type = {
    +    if (Locale.getAvailableLocales contains value) {
    --- End diff --
    
    Could you add the check as a param validator? See https://github.com/dongjinleekr/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/param/params.scala#L167. This is because there are more than one way to set params besides this setter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194607278
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,31 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive " +
    +                   "is true", typeConverter=TypeConverters.toString)
     
         @keyword_only
    -    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False):
    +    def __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=False,
    +                 locale=None):
             """
    -        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false)
    +        __init__(self, inputCol=None, outputCol=None, stopWords=None, caseSensitive=false,
    +        locale=None)
             """
             super(StopWordsRemover, self).__init__()
             self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.StopWordsRemover",
                                                 self.uid)
             self._setDefault(stopWords=StopWordsRemover.loadDefaultStopWords("english"),
    -                         caseSensitive=False)
    +                         caseSensitive=False, locale=StopWordsRemover.defaultLocale())
    --- End diff --
    
    You already have the `_java_obj`, call `_java_object.getLocale()` would give you the default locale. And then Python users only need `stopWordsRemover.getLocale()` to get the default value. In the param doc, we should make it clear that the default would be the JVM default locale.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91553/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91693/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    jenkins add to whitelist


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91635/testReport)** for PR 21501 at commit [`967d3c1`](https://github.com/apache/spark/commit/967d3c10a18b27e01c62d5254d188fb1e74de08d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194606418
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,28 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * Locale of the input for case insensitive matching. Ignored when [[caseSensitive]]
    +   * is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.getDefaultLocale()`
    --- End diff --
    
    I feel it is unnecessary to expose it as a public API. This is the same as `Locale.getDefault.toString` or `stopWordsRemover.getLocale` when nothing is set. See my comments on the Python API.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91549/testReport)** for PR 21501 at commit [`f572684`](https://github.com/apache/spark/commit/f572684e0df29f522bc0d5b1521e7dc5356bae49).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193779131
  
    --- Diff: python/pyspark/ml/feature.py ---
    @@ -2582,25 +2582,27 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl
                           typeConverter=TypeConverters.toListString)
         caseSensitive = Param(Params._dummy(), "caseSensitive", "whether to do a case sensitive " +
                               "comparison over the stop words", typeConverter=TypeConverters.toBoolean)
    +    locale = Param(Params._dummy(), "locale", "locale of the input. ignored when case sensitive is false",
    --- End diff --
    
    `false` -> `true`. (Copy the param doc from Scala)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91657/testReport)** for PR 21501 at commit [`675d7aa`](https://github.com/apache/spark/commit/675d7aa955cc54fbbce79ccb8294ef81afcd3fb5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193777474
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,31 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input for case insensitive
    +   * matching. Ignored when [[caseSensitive]] is true.
    +   * Default: Locale.getDefault.toString
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[String] = new Param[String](this, "locale",
    +    "Locale of the input for case insensitive matching. Ignored when caseSensitive is false.",
    --- End diff --
    
    `false` -> `true`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91637/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91506 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91506/testReport)** for PR 21501 at commit [`be38fd8`](https://github.com/apache/spark/commit/be38fd82bb711ab21e94b14cff1256c3c48f4c76).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r193493174
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala ---
    @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0") (@Since("1.5.0") override val uid: String
       @Since("1.5.0")
       def getCaseSensitive: Boolean = $(caseSensitive)
     
    -  setDefault(stopWords -> StopWordsRemover.loadDefaultStopWords("english"), caseSensitive -> false)
    +  /**
    +   * [[https://docs.oracle.com/javase/8/docs/api/java/util/Locale.html Locale]] of the input.
    +   * Ignored when [[caseSensitive]] is false.
    +   * Default: Locale.English
    +   * @see `StopWordsRemover.loadDefaultStopWords()`
    +   * @group param
    +   */
    +  @Since("2.4.0")
    +  val locale: Param[Locale] = new Param[Locale](this, "locale",
    --- End diff --
    
    Please use `Param[String]` so we can add Python/R APIs easily.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91634/testReport)** for PR 21501 at commit [`df7cd11`](https://github.com/apache/spark/commit/df7cd11b686720af4626d6226f88542cc550b754).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91652/testReport)** for PR 21501 at commit [`80d664f`](https://github.com/apache/spark/commit/80d664ffe9d182bae3a71ea5e2bf0e66b04b6d62).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    **[Test build #91637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91637/testReport)** for PR 21501 at commit [`3636329`](https://github.com/apache/spark/commit/363632981138b9e1fe84b4d31f0cea3bc047e241).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21501
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91634/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org