You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/20 18:50:11 UTC

[GitHub] [arrow-rs] ritchie46 opened a new pull request #335: respect offset in utf8 and list casts

ritchie46 opened a new pull request #335:
URL: https://github.com/apache/arrow-rs/pull/335


   # Which issue does this PR close?
   
   This is a fix for #334.
   
   Offsets were not taking into account in casts:
   
   ```
       list <-> large-list
       utf8 <-> large-utf8
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #335: respect offset in utf8 and list casts

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #335:
URL: https://github.com/apache/arrow-rs/pull/335#issuecomment-845487855


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#335](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (144a7bd) into [master](https://codecov.io/gh/apache/arrow-rs/commit/e18b356b1310d5b178f7b6290993b920d3fe7edd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e18b356) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   > :exclamation: Current head 144a7bd differs from pull request most recent head d0e540f. Consider uploading reports for the commit d0e540f to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/335/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #335      +/-   ##
   ==========================================
   + Coverage   82.52%   82.53%   +0.01%     
   ==========================================
     Files         162      162              
     Lines       44021    44037      +16     
   ==========================================
   + Hits        36328    36348      +20     
   + Misses       7693     7689       -4     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow-rs/pull/335/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jYXN0LnJz) | `94.53% <100.00%> (+0.04%)` | :arrow_up: |
   | [arrow/src/array/equal/list.rs](https://codecov.io/gh/apache/arrow-rs/pull/335/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL2xpc3QucnM=) | `89.55% <0.00%> (+5.97%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [e18b356...d0e540f](https://codecov.io/gh/apache/arrow-rs/pull/335?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on pull request #335: respect offset in utf8 and list casts

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #335:
URL: https://github.com/apache/arrow-rs/pull/335#issuecomment-846408890


   Thanks a lot @ritchie46 . I can't fully evaluate this because I am uncertain about the spec here.
   
   I am a bit uncertain about the spec here: do we require the offset buffer to start at 0, or is the requirement that the last value minus the first value must be equal to the length of the values buffer? @nevi-me , do you know what is the spec here? 
   
   I am asking because IPC does not support the `ArrayData::offset`. Do we need to "de-offset" sliced arrays when passing tor IPC, or what is the deal here? (last resort, I will bring this to the mailing list)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #335: respect offset in utf8 and list casts

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #335:
URL: https://github.com/apache/arrow-rs/pull/335


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on pull request #335: respect offset in utf8 and list casts

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #335:
URL: https://github.com/apache/arrow-rs/pull/335#issuecomment-846448353


   > I am a bit uncertain about the spec here: do we require the offset buffer to start at 0, or is the requirement that the last value minus the first value must be equal to the length of the values buffer? @nevi-me , do you know what is the spec here?
   
   This might be a bit tricky. AFAIK the spec doesn't prescribe what happens when a compute kernel interacts with sliced data.
   
   > Generally the first value in the offsets array is 0, and the last slot is the length of the values array. When serializing this layout, we recommend normalizing the offsets to start at 0. [0]
   
   I would interpret "generally" as, 'most implementations will expect to start at 0'. In which case, I would prefer a solution that carries the offset of the array, and starts the offset buffer at 0.
   
   I think carrying the offset of the input into the output in this case, is the most performant and compatible solution. Otherwise we'd have to racalculate the offsets to make sure that they start from 0.
   
   [0] https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on pull request #335: respect offset in utf8 and list casts

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on pull request #335:
URL: https://github.com/apache/arrow-rs/pull/335#issuecomment-846439340


   > I am a bit uncertain about the spec here: do we require the offset buffer to start at 0, or is the requirement that the last value minus the first value must be equal to the length of the values buffer? @nevi-me , do you know what is the spec here?
   
   Yes, I simply respected the API at this point. We throw an error if [offsets do not start at zero](https://github.com/apache/arrow-rs/blob/a25cafb5e1e5cbd371433b0ff251386279c0aa6c/arrow/src/array/array_list.rs#L238), but we may be stricter than the spec is. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org