You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/07/15 00:26:40 UTC
[GitHub] [spark] HyukjinKwon edited a comment on issue #25133:
[SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system
default locale isn't in available locales in JVM
HyukjinKwon edited a comment on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-511247541
Seems like some locales like `en-TW` or `pl-US` is not available in Java - https://www.oracle.com/technetwork/java/javase/java8locales-2095355.html . Seems like not all locales are supported and in this cases the locale seems to be a undefined locale:
```scala
scala> val locale = java.util.Locale.forLanguageTag("a")
locale: java.util.Locale =
scala> java.text.NumberFormat.getInstance(locale).format(12345)
res1: String = 12,345
```
If the locale isn't available in JVM users have to manually change system or JVM locale, or access to private property in PySpark (`_jvm`). For instance, if the locale specifies, " an English-speaking, Taiwanese locale." which I believe is a legitimate locale but not available in JVM, it seems not going to work. I found one [StackOverFlow question](https://stackoverflow.com/questions/55246080/pyspark-stopwordsremover-parameter-locale-given-invalid-value) about `pl-US`. In addition, I found one similar fix (`https://github.com/godotengine/godot/pull/6910`) in this case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org