You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/08/01 07:30:08 UTC

[GitHub] [avro] clesaec opened a new pull request, #1798: Avro 3532 naming in c

clesaec opened a new pull request, #1798:
URL: https://github.com/apache/avro/pull/1798

   [AVRO-3532](https://issues.apache.org/jira/browse/AVRO-3532) : Enlarge naming rules for C
   In Java, implemented naming rules accept accent or Chinese alphabet ...
   The aim of JIRA is to officially accept this implementation, the aim of this PR is to accept it for C lang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934773751


##########
lang/c/src/schema.c:
##########
@@ -48,26 +51,50 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+
+    	locale_t loc = newlocale(LC_ALL_MASK, "en_US.UTF-8", (locale_t) 0);
+    	locale_t currentLoc = (locale_t) 0;
+    	if (loc) {
+            currentLoc = uselocale(loc);
+        }
+        else {
+            setlocale(LC_ALL, "en_US.UTF-8");
+        }
+
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);

Review Comment:
   `UCPTRIE_FAST_U8_NEXT` appears to be such a macro.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935514425


##########
lang/c/src/schema.c:
##########
@@ -67,15 +67,24 @@ static int is_avro_id(const char *name)
         }
 
 	    size_t mbslen = mbstowcs(NULL, name, 0);
-	    wchar_t  wsName[mbslen + 1];
+#ifdef __STDC_NO_VLA__
+        // compiler does not support variable length arrays
+	    wchar_t  *wsName = calloc(mbslen + 1, sizeof(wchar_t));
+#else
+        wchar_t  wsName[mbslen + 1];
+#endif
         mbstowcs(wsName, name, mbslen + 1);
         size_t i;
         for (i = 0; i < mbslen; i++) {
             if (!(u_isalpha(wsName[i])
-                 || wsName[i] == L'_' || (i && isdigit(wsName[i])))) {
+                 || wsName[i] == L'_' || (i && u_isdigit(wsName[i])))) {
 				return 0;

Review Comment:
   Might need to `free(wsName)` before this `return`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935281673


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];

Review Comment:
   ok, code updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934745779


##########
lang/c/src/schema.c:
##########
@@ -48,20 +51,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");

Review Comment:
   Microsoft's UCRT appears to support both. Tested by outputting `localeconv()->int_curr_symbol`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934756361


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];

Review Comment:
   Microsoft's C compiler does not implement VLAs, and it defines `__STDC_NO_VLA__` to indicate that.
   
   <https://godbolt.org/z/Kz9j4xWsP>
   <https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-170#standard-predefined-macros>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934298785


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);
+        size_t i;
+        for (i = 0; i < mbslen; i++) {
+            if (!(iswalpha(wsName[i])
+                 || wsName[i] == '_' || (i && isdigit(wsName[i])))) {

Review Comment:
   ```suggestion
                    || wsName[i] == L'_' || (i && iswdigit(wsName[i])))) {
   ```



##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];

Review Comment:
   IIRC, variable length arrays are not supported by all compilers.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934295721


##########
lang/c/src/schema.c:
##########
@@ -48,20 +51,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");

Review Comment:
   Also, should this be a hyphen "en-US" rather than an underscore "en_US"?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935514927


##########
lang/c/src/schema.c:
##########
@@ -67,15 +67,24 @@ static int is_avro_id(const char *name)
         }
 
 	    size_t mbslen = mbstowcs(NULL, name, 0);
-	    wchar_t  wsName[mbslen + 1];
+#ifdef __STDC_NO_VLA__
+        // compiler does not support variable length arrays
+	    wchar_t  *wsName = calloc(mbslen + 1, sizeof(wchar_t));
+#else
+        wchar_t  wsName[mbslen + 1];
+#endif
         mbstowcs(wsName, name, mbslen + 1);
         size_t i;
         for (i = 0; i < mbslen; i++) {
             if (!(u_isalpha(wsName[i])
-                 || wsName[i] == L'_' || (i && isdigit(wsName[i])))) {
+                 || wsName[i] == L'_' || (i && u_isdigit(wsName[i])))) {
 				return 0;

Review Comment:
   And restore the locale as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934770067


##########
lang/c/src/schema.c:
##########
@@ -48,26 +51,50 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+
+    	locale_t loc = newlocale(LC_ALL_MASK, "en_US.UTF-8", (locale_t) 0);
+    	locale_t currentLoc = (locale_t) 0;
+    	if (loc) {
+            currentLoc = uselocale(loc);
+        }
+        else {
+            setlocale(LC_ALL, "en_US.UTF-8");
+        }
+
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);

Review Comment:
   <https://unicode-org.github.io/icu/userguide/strings/properties.html#enumerated-property-over-string> mentions "UTF-8 macros" that would apparently let you look up properties of a UTF-8 encoded character without first recoding to wchar_t. If you could use that and rely solely on ICU rather than set up a locale, then the code would be more easily portable to Windows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935282140


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);
+        size_t i;
+        for (i = 0; i < mbslen; i++) {
+            if (!(iswalpha(wsName[i])
+                 || wsName[i] == '_' || (i && isdigit(wsName[i])))) {

Review Comment:
   usage of u_isdigit to be consistent with u_isalpha



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934288947


##########
lang/c/src/schema.c:
##########
@@ -48,20 +51,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");

Review Comment:
   Please don't call `setlocale` from the Avro C library. It affects all threads and its effects persist after `is_avro_id` returns. Instead, use functions that create and query a locale object without affecting global state:
   - [`newlocale`](https://man7.org/linux/man-pages/man3/newlocale.3p.html), [`iswalpha_l`](https://man7.org/linux/man-pages/man3/iswalpha.3p.html), and [`freelocale`](https://man7.org/linux/man-pages/man3/freelocale.3p.html) on POSIX
   - [`_create_locale`](https://docs.microsoft.com/cpp/c-runtime-library/reference/create-locale-wcreate-locale?view=msvc-170), [`_iswalpha_l`](https://docs.microsoft.com/cpp/c-runtime-library/reference/isalpha-iswalpha-isalpha-l-iswalpha-l?view=msvc-170), and [`_free_locale`](https://docs.microsoft.com/cpp/c-runtime-library/reference/free-locale?view=msvc-170) on Windows



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on PR #1798:
URL: https://github.com/apache/avro/pull/1798#issuecomment-1200932300

   A compromise could be to let the application provide a callback function that replaces is_avro_id. Then the application developer would be responsible for any locale or ICU dependencies. The library could provide a default function that implements the traditional A-Za-z0-9_ rules. An application that wants to minimize its dependencies and trusts that the schemata are valid could define a trivial callback that just accepts everything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934780937


##########
lang/c/src/schema.c:
##########
@@ -48,26 +51,50 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+
+    	locale_t loc = newlocale(LC_ALL_MASK, "en_US.UTF-8", (locale_t) 0);
+    	locale_t currentLoc = (locale_t) 0;
+    	if (loc) {
+            currentLoc = uselocale(loc);
+        }
+        else {
+            setlocale(LC_ALL, "en_US.UTF-8");
+        }
+
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);

Review Comment:
   Or use [U8_NEXT](https://github.com/unicode-org/icu/blob/242ad4723ed579bd1f317846b2c04d96b05750cd/icu4c/source/common/unicode/utf8.h#L332-L352) to decode UTF-8 one `UChar32` at a time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934645423


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);
+        size_t i;
+        for (i = 0; i < mbslen; i++) {
+            if (!(iswalpha(wsName[i])
+                 || wsName[i] == '_' || (i && isdigit(wsName[i])))) {

Review Comment:
   fix for (L'_'),  for iswdigit : Do we accept other than 0 to 9 ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934648417


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];

Review Comment:
   Wow ! i always see that even with 30 years old C compiler :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1798: AVRO-3532: Naming in c

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1798:
URL: https://github.com/apache/avro/pull/1798#issuecomment-1303941698

   I took the liberty of changing these two PRs (https://github.com/apache/avro/pull/1787 and https://github.com/apache/avro/pull/1798) to draft status, just to prevent any accidents!
   
   These behaviour changes should be merged after a change to the specification, and we really should have a stronger consensus around the whether this is the right thing to do before changing the spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on PR #1798:
URL: https://github.com/apache/avro/pull/1798#issuecomment-1200939079

   On Windows, another pain point is that `iswalpha` cannot recognize supplementary characters because `wchar_t` is UTF-16. To fix that, I think one needs to use ICU instead. [Microsoft integrated ICU into Windows 10 Creators Update](https://docs.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-), so if the Avro tools use ICU and target at least this version of Windows, then they don't need to get ICU libraries from elsewhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on PR #1798:
URL: https://github.com/apache/avro/pull/1798#issuecomment-1201344237

   So, i used ICU and add possibility to provide a call back function instead of is_avro_id. 
   (_For the second, users have to be very careful, it can lead to errors_)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935502711


##########
lang/c/src/schema.c:
##########
@@ -48,20 +51,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");

Review Comment:
   Anyhow, getting "USD" from `localeconv()->int_curr_symbol` shows that Microsoft's `setlocale` recognizes the "US" country code in both "en_US.UTF-8" and "en-US.UTF-8", even though [UCRT Locale names, Languages, and Country/Region strings](https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-170) recommends the <var>locale-name</var> form with a hyphen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935548089


##########
lang/c/src/schema.c:
##########
@@ -67,15 +67,24 @@ static int is_avro_id(const char *name)
         }
 
 	    size_t mbslen = mbstowcs(NULL, name, 0);
-	    wchar_t  wsName[mbslen + 1];
+#ifdef __STDC_NO_VLA__
+        // compiler does not support variable length arrays
+	    wchar_t  *wsName = calloc(mbslen + 1, sizeof(wchar_t));
+#else
+        wchar_t  wsName[mbslen + 1];
+#endif
         mbstowcs(wsName, name, mbslen + 1);
         size_t i;
         for (i = 0; i < mbslen; i++) {
             if (!(u_isalpha(wsName[i])
-                 || wsName[i] == L'_' || (i && isdigit(wsName[i])))) {
+                 || wsName[i] == L'_' || (i && u_isdigit(wsName[i])))) {
 				return 0;

Review Comment:
   Indeed :+1: 
   Done 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934747453


##########
lang/c/src/schema.c:
##########
@@ -48,20 +50,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);
+        size_t i;
+        for (i = 0; i < mbslen; i++) {
+            if (!(iswalpha(wsName[i])
+                 || wsName[i] == '_' || (i && isdigit(wsName[i])))) {

Review Comment:
   `isdigit` is not correct here anyway, because it requires that the input is either `EOF` or an `unsigned char`. If you give it an out-of-range value, then it may read beyond the end of a lookup table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935512088


##########
lang/c/src/schema.c:
##########
@@ -48,26 +51,50 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
 	if (name) {
-		len = strlen(name);
-		if (len < 1) {
-			return 0;
-		}
-		for (i = 0; i < len; i++) {
-			if (!(isalpha(name[i])
-			      || name[i] == '_' || (i && isdigit(name[i])))) {
+		size_t len = strlen(name);
+    	if (len < 1) {
+    		return 0;
+    	}
+
+    	locale_t loc = newlocale(LC_ALL_MASK, "en_US.UTF-8", (locale_t) 0);
+    	locale_t currentLoc = (locale_t) 0;
+    	if (loc) {
+            currentLoc = uselocale(loc);
+        }
+        else {
+            setlocale(LC_ALL, "en_US.UTF-8");
+        }
+
+	    size_t mbslen = mbstowcs(NULL, name, 0);
+	    wchar_t  wsName[mbslen + 1];
+        mbstowcs(wsName, name, mbslen + 1);

Review Comment:
   That code works fine here, after this initialization:
   
   ```C
   const uint8_t name[] = u8"pr\u00E9nom";
   int32_t len = (sizeof name) - 1;
   ```
   
   The resulting values of `c` are: 112, 114, 233, 110, 111, 109



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r934646444


##########
lang/c/src/schema.c:
##########
@@ -48,20 +51,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");

Review Comment:
   newlocale / uselocale used when possible.
   Unit test does not work with "en-US", so i keep "en_US" (at least on my laptop)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1798: Avro 3532 naming in c

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1798:
URL: https://github.com/apache/avro/pull/1798#discussion_r935269582


##########
lang/c/src/schema.c:
##########
@@ -48,20 +51,25 @@ static void avro_schema_init(avro_schema_t schema, avro_type_t type)
 
 static int is_avro_id(const char *name)
 {
-	size_t i, len;
+    setlocale(LC_ALL, "en_US.UTF-8");

Review Comment:
   [lconv](https://en.cppreference.com/w/c/locale/lconv) is for monetary & number format, not for language or charset ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org