You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/21 08:19:43 UTC

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2224: LUCENE-9681: Hunspell spellchecker: support numbers with separators

dweiss commented on a change in pull request #2224:
URL: https://github.com/apache/lucene-solr/pull/2224#discussion_r561677984



##########
File path: lucene/analysis/common/src/java/org/apache/lucene/analysis/hunspell/SpellChecker.java
##########
@@ -51,6 +57,28 @@ public boolean spell(String word) {
     return false;
   }
 
+  private static boolean isNumber(String s) {
+    int i = 0;
+    while (i < s.length()) {

Review comment:
       Should this follow Hunspell's rules exactly? I ask because you could make it more general by scanning unicode codepoints and then verifying each codepoint's attributes (whether it is a numeric or not). I know this is a tiny fraction of use cases but unicode has all sorts of odd characters that are numerics:
   
   https://www.fileformat.info/info/unicode/category/Nd/list.htm

##########
File path: lucene/analysis/common/src/test/org/apache/lucene/analysis/hunspell/i53643.good
##########
@@ -0,0 +1,21 @@
+1
+12
+123
+1234
+12345
+123456
+1234567
+1.1
+1.12
+1.123
+1.1234
+1.12345
+1.123456
+12.1
+123.12
+1234.123
+12345.1234
+123456.12345
+1234567.123456
+4,2

Review comment:
       Yeah.... this is really locale-specific. Can't be solved accurately without the context (locale). Even if you do have the locale, people write inconsistently (grouping and fraction separators).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org