You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/05/22 10:39:57 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #4253: Prototype ArrayView Types

tustvold opened a new issue, #4253:
URL: https://github.com/apache/arrow-rs/issues/4253

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for this feature, in addition to  the *what*)
   -->
   
   There is ongoing discussion of introducing an ArrayView type to the format - https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb
   
   We should explore the design space around this, in particular to gather some empirical data as to the impact of introducing such a type.
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   I would like to prototype an implementation of StringView and explore integrating it into the parquet reader, where it ostensibly could yield to some non-trivial performance improvements
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4253: Prototype ArrayView Types

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1590798436

   My current feelings on this matter are summarized in https://lists.apache.org/thread/1j0hdbfd0q2636zs9z0x19fkcn87gjhf
   
   TLDR I think improving the support for sparse dictionaries may be sufficient to support this use case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4253: Prototype ArrayView Types

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1580849891

   The benchmarks in #4378 show half the execution time being spent rewriting data to remove empty array slices. This likely could be optimised, and it is unclear how realistic the benchmark is, but I thought it was an interesting data point.
   
   ![image](https://github.com/apache/arrow-rs/assets/1781103/4e413a3a-1f99-484d-99e4-d05a93ac5a3f)
   
   Theoretically ArrayView types would remove the need for this, whilst also removing the memcpy when decoding byte arrays. I'd anticipate roughly a 2x return, with bigger returns for more heavily nested data


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Prototype ArrayView Types [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #4253: Prototype ArrayView Types
URL: https://github.com/apache/arrow-rs/issues/4253


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Prototype ArrayView Types [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1939030402

   From my perspective, the prototype was completed in https://github.com/apache/arrow-rs/pull/4585 and follow on work is tracked in https://github.com/apache/arrow-rs/issues/5374 so closing this ticket down and let's track it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Prototype ArrayView Types [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1933233333

   Filed https://github.com/apache/arrow-rs/issues/5374 to track implementing what was added to the spec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #4253: Prototype ArrayView Types

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4253: Prototype ArrayView Types
URL: https://github.com/apache/arrow-rs/issues/4253


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Prototype ArrayView Types [arrow-rs]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #4253:
URL: https://github.com/apache/arrow-rs/issues/4253#issuecomment-1939022297

   Here was a draft PR: https://github.com/apache/arrow-rs/pull/4585


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org