You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/29 23:00:25 UTC

[GitHub] [arrow] seddonm1 opened a new pull request #9366: ARROW-11434: [Rust][DataFusion] Rename length kernel to octet_length

seddonm1 opened a new pull request #9366:
URL: https://github.com/apache/arrow/pull/9366


   This PR renames the `length` kernel to `octet_length` to clearly indicate what it returns and allows differentiation from `character_length`. The use of the term `octet` could be replaced with `bytes` but was chosen given there is an ANSI SQL function `octet_length`.
   
   I have created the correct `character_length` function as part of https://github.com/apache/arrow/pull/9243.
   
   **Issue**
   The rust `length` kernel currently counts number of `bytes`/`octets` which may or may not be the same as the number of characters given that Arrow uses UTF8 encoding. This means that the result of the `length` kernel on a string like `josé` will be 5 bytes rather than 4 characters.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9366: ARROW-11434: [Rust][DataFusion] Rename length kernel to octet_length

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9366:
URL: https://github.com/apache/arrow/pull/9366#issuecomment-770159697


   I agree that it may be misleading, but from Rust's perspective, it is not "incorrect" to use `length` to denote the number of bytes of a string: `String::len` uses the same convention, and you need to use `s.chars().count()` to call the number of characters.
   
   This also collides with #9353, where `length` is extended to support `ListArray` and `BinaryArray`.
   
   One idea is to keep the name as is on the arrow crate, but name it `octet_length` on DataFusion's SQL and API (to be consistent with Postgres).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9366: ARROW-11434: [Rust][DataFusion] Rename length kernel to octet_length

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9366:
URL: https://github.com/apache/arrow/pull/9366#issuecomment-770094759


   https://issues.apache.org/jira/browse/ARROW-11434


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 commented on pull request #9366: ARROW-11434: [Rust][DataFusion] Rename length kernel to octet_length

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on pull request #9366:
URL: https://github.com/apache/arrow/pull/9366#issuecomment-770161899


   No problem. I will close this PR and raise one with the function comments updated to clarify it's intended behavior.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] seddonm1 closed pull request #9366: ARROW-11434: [Rust][DataFusion] Rename length kernel to octet_length

Posted by GitBox <gi...@apache.org>.
seddonm1 closed pull request #9366:
URL: https://github.com/apache/arrow/pull/9366


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org