You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/05/22 10:39:57 UTC
[GitHub] [arrow-rs] tustvold opened a new issue, #4253: Prototype ArrayView Types
tustvold opened a new issue, #4253:
URL: https://github.com/apache/arrow-rs/issues/4253
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
<!--
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
(This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*)
-->
There is ongoing discussion of introducing an ArrayView type to the format - https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb
We should explore the design space around this, in particular to gather some empirical data as to the impact of introducing such a type.
**Describe the solution you'd like**
<!--
A clear and concise description of what you want to happen.
-->
I would like to prototype an implementation of StringView and explore integrating it into the parquet reader, where it ostensibly could yield to some non-trivial performance improvements
**Describe alternatives you've considered**
<!--
A clear and concise description of any alternative solutions or features you've considered.
-->
**Additional context**
<!--
Add any other context or screenshots about the feature request here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on issue #4253: Prototype ArrayView Types
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1590798436
My current feelings on this matter are summarized in https://lists.apache.org/thread/1j0hdbfd0q2636zs9z0x19fkcn87gjhf
TLDR I think improving the support for sparse dictionaries may be sufficient to support this use case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on issue #4253: Prototype ArrayView Types
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1580849891
The benchmarks in #4378 show half the execution time being spent rewriting data to remove empty array slices. This likely could be optimised, and it is unclear how realistic the benchmark is, but I thought it was an interesting data point.
![image](https://github.com/apache/arrow-rs/assets/1781103/4e413a3a-1f99-484d-99e4-d05a93ac5a3f)
Theoretically ArrayView types would remove the need for this, whilst also removing the memcpy when decoding byte arrays. I'd anticipate roughly a 2x return, with bigger returns for more heavily nested data
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Prototype ArrayView Types [arrow-rs]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #4253: Prototype ArrayView Types
URL: https://github.com/apache/arrow-rs/issues/4253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Prototype ArrayView Types [arrow-rs]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1939030402
From my perspective, the prototype was completed in https://github.com/apache/arrow-rs/pull/4585 and follow on work is tracked in https://github.com/apache/arrow-rs/issues/5374 so closing this ticket down and let's track it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Prototype ArrayView Types [arrow-rs]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1933233333
Filed https://github.com/apache/arrow-rs/issues/5374 to track implementing what was added to the spec
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold closed issue #4253: Prototype ArrayView Types
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4253: Prototype ArrayView Types
URL: https://github.com/apache/arrow-rs/issues/4253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Prototype ArrayView Types [arrow-rs]
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1939022297
Here was a draft PR: https://github.com/apache/arrow-rs/pull/4585
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org