You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Eduardo Ponce (Jira)" <ji...@apache.org> on 2021/09/01 07:30:00 UTC

[jira] [Updated] (ARROW-13570) [C++][Compute] Additional scalar ASCII kernels can reuse original offsets buffer

     [ https://issues.apache.org/jira/browse/ARROW-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eduardo Ponce updated ARROW-13570:
----------------------------------
    Description: 
Some ASCII scalar string kernels are able to reuse the original offsets buffer, so they are not preallocated in the output (use *MemAllocation::NO_PREALLOCATE* during registration). Currently, only kernels that apply a transformation to each character independently via [StringDataTransform|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L590-L631] support the no preallocation policy. But there are additional string kernels that do not modify the length (nor offsets) of the input string but apply scalar transforms that depend on neighboring characters.

This issue should extend/create *StringDataTransform* to take multiple input transforms in order to support *MemAllocation::NO_PREALLOCATE* policy for additional scalar ASCII kernels (e.g., _ascii_title_).

  was:
Some ASCII scalar string kernels are able to reuse the original offsets buffer, so they are not preallocated in the output (use *MemAllocation::NO_PREALLOCATE* during registration). Currently, only kernels that apply a transformation to each character independently via [StringDataTransform|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L590-L631] support the no preallocation policy. But there are additional string kernels that do not modify the length (nor offsets) of the input string but apply different transforms throughout the characters.

This issue should extend/create *StringDataTransform* to take multiple input transforms in order to support *MemAllocation::NO_PREALLOCATE* policy for additional scalar ASCII kernels (e.g., _ascii_title_).


> [C++][Compute] Additional scalar ASCII kernels can reuse original offsets buffer
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-13570
>                 URL: https://issues.apache.org/jira/browse/ARROW-13570
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Eduardo Ponce
>            Priority: Major
>             Fix For: 6.0.0
>
>
> Some ASCII scalar string kernels are able to reuse the original offsets buffer, so they are not preallocated in the output (use *MemAllocation::NO_PREALLOCATE* during registration). Currently, only kernels that apply a transformation to each character independently via [StringDataTransform|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L590-L631] support the no preallocation policy. But there are additional string kernels that do not modify the length (nor offsets) of the input string but apply scalar transforms that depend on neighboring characters.
> This issue should extend/create *StringDataTransform* to take multiple input transforms in order to support *MemAllocation::NO_PREALLOCATE* policy for additional scalar ASCII kernels (e.g., _ascii_title_).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)