You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Folyd (via GitHub)" <gi...@apache.org> on 2023/06/16 10:53:54 UTC

[GitHub] [arrow-rs] Folyd opened a new pull request, #4420: fix: infer unsigned int as u64

Folyd opened a new pull request, #4420:
URL: https://github.com/apache/arrow-rs/pull/4420

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
    
   Referring an `u64` integer which is bigger than `i64::Max` as a `DataType::Float64` isn't quite accurate. For example:
   
   ```rust
   let span_id: u64 = ThreadRng::default().gen();
   ```
   
   Where the `span_id` is less than `i64::Max`, it is referred to as a `DataType::Int64`, while it is bigger than `i64::Max`, it is mis-inferred to `DataType::Float64`.
   
   However, `serde_json::Number` loses the necessary concrete type (`u8`, `u16`, `u32`, etc), otherwise, we can refer integer more accurately.
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   Not sure.
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Folyd commented on a diff in pull request #4420: fix: infer unsigned int as u64

Posted by "Folyd (via GitHub)" <gi...@apache.org>.
Folyd commented on code in PR #4420:
URL: https://github.com/apache/arrow-rs/pull/4420#discussion_r1232432307


##########
arrow-json/src/reader/schema.rs:
##########
@@ -442,7 +442,9 @@ fn collect_field_types_from_object(
                 // inferring
             }
             Value::Number(n) => {
-                if n.is_i64() {
+                if n.is_u64() {
+                    set_object_scalar_field_type(field_types, k, DataType::UInt64)?;
+                } else if n.is_i64() {
                     set_object_scalar_field_type(field_types, k, DataType::Int64)?;

Review Comment:
   Yes, I prefer the unsigned one since `u64` has a wider positive-integer range than `i64`. If the number is negative, we treat it as an `i64`. This is a breaking change, would be allowed if we release the next major version?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] fix(json reader): infer unsigned int as u64 [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed pull request #4420: fix(json reader): infer unsigned int as u64
URL: https://github.com/apache/arrow-rs/pull/4420


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4420: fix: infer unsigned int as u64

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4420:
URL: https://github.com/apache/arrow-rs/pull/4420#discussion_r1232329581


##########
arrow-json/src/reader/schema.rs:
##########
@@ -442,7 +442,9 @@ fn collect_field_types_from_object(
                 // inferring
             }
             Value::Number(n) => {
-                if n.is_i64() {
+                if n.is_u64() {
+                    set_object_scalar_field_type(field_types, k, DataType::UInt64)?;
+                } else if n.is_i64() {
                     set_object_scalar_field_type(field_types, k, DataType::Int64)?;

Review Comment:
   This will make unsigned integer as preferred one over signed? To reduce behavior change, I guess we should be making signed integer as preferred?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4420: fix(json reader): infer unsigned int as u64

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4420:
URL: https://github.com/apache/arrow-rs/pull/4420#discussion_r1238663197


##########
arrow-json/src/reader/schema.rs:
##########
@@ -476,6 +487,10 @@ fn collect_field_types_from_object(
 /// Infer the fields of a JSON file by reading all items from the JSON Value Iterator.
 ///
 /// The following type coercion logic is implemented:
+/// * Unsigned integer are converted to `UInt64`
+/// * Signed integer are converted to `Int64`
+/// * `Int64` and `UInt64` are converted to `Int64`

Review Comment:
   I thought this PR changed it so that unsigned ints are inferred to be `UInt64` 🤔 



##########
arrow-json/src/reader/schema.rs:
##########
@@ -476,6 +487,10 @@ fn collect_field_types_from_object(
 /// Infer the fields of a JSON file by reading all items from the JSON Value Iterator.
 ///
 /// The following type coercion logic is implemented:
+/// * Unsigned integer are converted to `UInt64`
+/// * Signed integer are converted to `Int64`

Review Comment:
   ```suggestion
   /// * Signed integer smaller than 64 bits are converted to `Int64`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Folyd commented on a diff in pull request #4420: fix(json reader): infer unsigned int as u64

Posted by "Folyd (via GitHub)" <gi...@apache.org>.
Folyd commented on code in PR #4420:
URL: https://github.com/apache/arrow-rs/pull/4420#discussion_r1238755348


##########
arrow-json/src/reader/schema.rs:
##########
@@ -476,6 +487,10 @@ fn collect_field_types_from_object(
 /// Infer the fields of a JSON file by reading all items from the JSON Value Iterator.
 ///
 /// The following type coercion logic is implemented:
+/// * Unsigned integer are converted to `UInt64`
+/// * Signed integer are converted to `Int64`
+/// * `Int64` and `UInt64` are converted to `Int64`

Review Comment:
   I mean the logic in the `coerce_data_type()` function, any suggestion to describe it accurately?
   ```rust
   (DataType::Int64, DataType::UInt64) | (DataType::UInt64, DataType::Int64) => {
       DataType::Int64
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4420: fix(json reader): infer unsigned int as u64

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4420:
URL: https://github.com/apache/arrow-rs/pull/4420#issuecomment-1611281397

   Marking as draft to signify this isn't awaiting review currently, please feel to mark ready for review when you would like me to take another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org