You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/27 15:36:50 UTC

[GitHub] [arrow] bkietz opened a new pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

bkietz opened a new pull request #7292:
URL: https://github.com/apache/arrow/pull/7292


   Some 64 bit integer values are not representable as JSON numbers, so store these as strings.
   
   This is issue is a regression; original fix was #5267


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r433904059



##########
File path: dev/archery/archery/integration/tester_go.py
##########
@@ -26,7 +26,8 @@ class GoTester(Tester):
     CONSUMER = True
 
     # FIXME(sbinet): revisit for Go modules
-    GOPATH = os.getenv('GOPATH', '~/go')
+    HOME = os.getenv('HOME', '~')

Review comment:
       I don't know why `~` failed resolve to `$HOME` locally but this seemed like a way to save someone the next person from the frustration of reading the script to figure out that `GOPATH` needs set.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-639231033


   This patch is failing some of the time in Appveyor due to integer narrowing. I'm fixing the problems in #7352 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-634752661


   https://issues.apache.org/jira/browse/ARROW-8471


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-638975904


   @sbinet Could you validate the Go changes here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r434064638



##########
File path: cpp/src/arrow/type_traits.h
##########
@@ -644,6 +644,14 @@ template <typename T, typename R = void>
 using enable_if_physical_unsigned_integer =
     enable_if_t<is_physical_unsigned_integer_type<T>::value, R>;
 
+template <typename T>
+using is_physical_integer_type =
+    std::integral_constant<bool, is_physical_unsigned_integer_type<T>::value ||
+                                     is_physical_signed_integer_type<T>::value>;

Review comment:
       C++ indentation is deferred to `clang-format`. We have a CI job which ensures that running `clang-format` is a no-op. As for explaining why it made this choice... I'm not familiar enough with its implementation to say, sorry




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-638975534


   I just looked at the Java code: for Int64 and UInt64, it uses `jackson.core.JsonParser::getValueAsString`, which accepts both number and string tokens.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-638363587


   @bkietz one thing, you didn't change anything in Java, so either it was already behaving like this, or the test is disabled for it? If this is disabled, that should be a blocker for 1.0, parsing (u)int64 seems like a blocking requirement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kiszk commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r433975329



##########
File path: cpp/src/arrow/type_traits.h
##########
@@ -644,6 +644,14 @@ template <typename T, typename R = void>
 using enable_if_physical_unsigned_integer =
     enable_if_t<is_physical_unsigned_integer_type<T>::value, R>;
 
+template <typename T>
+using is_physical_integer_type =
+    std::integral_constant<bool, is_physical_unsigned_integer_type<T>::value ||
+                                     is_physical_signed_integer_type<T>::value>;

Review comment:
       not sure 100%. but, is this indentation correct?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-638973178


   > @bkietz one thing, you didn't change anything in Java, so either it was already behaving like this, or the test is disabled for it?
   
   Java probably accepts both forms, because AFAICT the test isn't disabled for it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques closed pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
fsaintjacques closed pull request #7292:
URL: https://github.com/apache/arrow/pull/7292


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r433904243



##########
File path: cpp/src/arrow/ipc/json_integration_test.cc
##########
@@ -272,7 +272,7 @@ static const char* JSON_EXAMPLE = R"example(
         {
           "name": "foo",
           "count": 5,
-          "DATA": [1, 2, 3, 4, 5],
+          "DATA": ["1", "2", "3", "4", "5"],

Review comment:
       Will do




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r433805733



##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -963,19 +963,23 @@ func i32ToJSON(arr *array.Int32) []interface{} {
 func i64FromJSON(vs []interface{}) []int64 {
 	o := make([]int64, len(vs))
 	for i, v := range vs {
-		vv, err := v.(json.Number).Int64()
+		vv, err := strconv.ParseInt(v.(string), 10, 64)
 		if err != nil {
 			panic(err)
 		}
-		o[i] = int64(vv)
+		o[i] = vv
 	}
 	return o
 }
 
 func i64ToJSON(arr *array.Int64) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = arr.Value(i)
+    if arr.IsValid(i) {

Review comment:
       Indentation problem here?

##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -1330,7 +1350,11 @@ func durationFromJSON(vs []interface{}) []arrow.Duration {
 func durationToJSON(arr *array.Duration) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = arr.Value(i)
+    if arr.IsValid(i) {

Review comment:
       And here.

##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -1205,7 +1213,11 @@ func date64FromJSON(vs []interface{}) []arrow.Date64 {
 func date64ToJSON(arr *array.Date64) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = int64(arr.Value(i))
+    if arr.IsValid(i) {

Review comment:
       And here as well.

##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -1265,7 +1281,11 @@ func timestampFromJSON(vs []interface{}) []arrow.Timestamp {
 func timestampToJSON(arr *array.Timestamp) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = int64(arr.Value(i))
+    if arr.IsValid(i) {

Review comment:
       And here.

##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -1245,15 +1257,19 @@ func time64FromJSON(vs []interface{}) []arrow.Time64 {
 func time64ToJSON(arr *array.Time64) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = int64(arr.Value(i))
+    if arr.IsValid(i) {

Review comment:
       And here.

##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -1043,19 +1047,23 @@ func u32ToJSON(arr *array.Uint32) []interface{} {
 func u64FromJSON(vs []interface{}) []uint64 {
 	o := make([]uint64, len(vs))
 	for i, v := range vs {
-		vv, err := strconv.ParseUint(v.(json.Number).String(), 10, 64)
+		vv, err := strconv.ParseUint(v.(string), 10, 64)
 		if err != nil {
 			panic(err)
 		}
-		o[i] = uint64(vv)
+		o[i] = vv
 	}
 	return o
 }
 
 func u64ToJSON(arr *array.Uint64) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = arr.Value(i)
+    if arr.IsValid(i) {

Review comment:
       Indentation problem here as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#issuecomment-638994177


   I say we move on and accept the go changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r432665221



##########
File path: cpp/src/arrow/ipc/json_integration_test.cc
##########
@@ -255,7 +255,7 @@ static const char* JSON_EXAMPLE = R"example(
     "fields": [
       {
         "name": "foo",
-        "type": {"name": "int", "isSigned": true, "bitWidth": 32},
+        "type": {"name": "int", "isSigned": true, "bitWidth": 64},

Review comment:
       Add a i32 column so it shows both behavior.

##########
File path: cpp/src/arrow/ipc/json_integration_test.cc
##########
@@ -272,7 +272,7 @@ static const char* JSON_EXAMPLE = R"example(
         {
           "name": "foo",
           "count": 5,
-          "DATA": [1, 2, 3, 4, 5],
+          "DATA": ["1", "2", "3", "4", "5"],

Review comment:
       add literal for INT64_MIN and INT64_MAX.

##########
File path: dev/archery/archery/integration/tester_go.py
##########
@@ -26,7 +26,8 @@ class GoTester(Tester):
     CONSUMER = True
 
     # FIXME(sbinet): revisit for Go modules
-    GOPATH = os.getenv('GOPATH', '~/go')
+    HOME = os.getenv('HOME', '~')

Review comment:
       Why the change, just set `GOPATH` if you have something else?

##########
File path: dev/archery/archery/integration/datagen.py
##########
@@ -179,8 +179,8 @@ def generate_column(self, size, name=None):
         return self.generate_range(size, lower_bound, upper_bound, name=name)
 
     def generate_range(self, size, lower, upper, name=None):
-        values = [int(x) for x in
-                  np.random.randint(lower, upper, size=size)]
+        values = list(map(int if self.bit_width < 64 else str,
+                          np.random.randint(lower, upper, size=size)))

Review comment:
       Append the extreme values so we test all integrations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r434063268



##########
File path: go/arrow/internal/arrjson/arrjson.go
##########
@@ -963,19 +963,23 @@ func i32ToJSON(arr *array.Int32) []interface{} {
 func i64FromJSON(vs []interface{}) []int64 {
 	o := make([]int64, len(vs))
 	for i, v := range vs {
-		vv, err := v.(json.Number).Int64()
+		vv, err := strconv.ParseInt(v.(string), 10, 64)
 		if err != nil {
 			panic(err)
 		}
-		o[i] = int64(vv)
+		o[i] = vv
 	}
 	return o
 }
 
 func i64ToJSON(arr *array.Int64) []interface{} {
 	o := make([]interface{}, arr.Len())
 	for i := range o {
-		o[i] = arr.Value(i)
+    if arr.IsValid(i) {

Review comment:
       vim was expanding tabs to spaces. Thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #7292: ARROW-8471: [C++][Integration] Represent 64 bit integers as strings

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #7292:
URL: https://github.com/apache/arrow/pull/7292#discussion_r433904553



##########
File path: dev/archery/archery/integration/datagen.py
##########
@@ -179,8 +179,8 @@ def generate_column(self, size, name=None):
         return self.generate_range(size, lower_bound, upper_bound, name=name)
 
     def generate_range(self, size, lower, upper, name=None):
-        values = [int(x) for x in
-                  np.random.randint(lower, upper, size=size)]
+        values = list(map(int if self.bit_width < 64 else str,
+                          np.random.randint(lower, upper, size=size)))

Review comment:
       alright




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org