You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/15 00:26:40 UTC

[GitHub] [spark] HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
 
 
   Seems like some locales like `en-TW` or `pl-US` is not available in Java - https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . Seems like not all locales are supported and in this cases the locale seems to be a undefined locale:
   
   ```scala
   scala> val locale = java.util.Locale.forLanguageTag("a")
   locale: java.util.Locale =
   
   scala> java.text.NumberFormat.getInstance(locale).format(12345)
   res1: String = 12,345
   ```
   
   If the locale isn't available in JVM users have to manually change system or JVM locale, or access to private property in PySpark (`_jvm`). For instance, if the locale specifies, " an English-speaking, Taiwanese locale." which I believe is a legitimate locale but not available in JVM, it seems not going to work. I found one [StackOverFlow question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value) about `pl-US`. In addition, I found one similar fix (`https://github.com/godotengine/godot/pull/6910`) in this case.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org