You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "bkietz (via GitHub)" <gi...@apache.org> on 2023/05/13 13:47:51 UTC

[GitHub] [arrow] bkietz opened a new issue, #35581: [C++] ArraySpan::FillFromScalar is unsafe

bkietz opened a new issue, #35581:
URL: https://github.com/apache/arrow/issues/35581

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   `ArraySpan::FillFromScalar` can store pointers into the structure itself (specifically, into `ArraySpan::scratch_space`) which produces an ArraySpan which easily be unsafely moved and copied:
   
   ```c++
   ArraySpan span;
   span.FillFromScalar(scalar);
   
   UseSpan(span); // Fine; the original span is still alive.
   
   return span; // Undefined Behavior; the returned copy views
                // a stack variable whose lifetime has ended.
   ```
   
   The capability to view a Scalar as an ArraySpan can be preserved and made safer by restricting access to the span to an explicitly delineated scope:
   
   ```c++
   ArraySpan::FillFromScalar(scalar, [&](ArraySpan span) {
     UseSpan(span); // Fine; the viewed scratch space is alive inside this scope.
   });
   ```
   
   This has the pleasant side effect of reducing the size of ArraySpan by 16 bytes.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1563469290

   It's more serious than that since the ArraySpan *does* own the scratch space, which means that copies are viewing data in a scalar and also data in the original span (which may not exist anymore)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1561530772

   Given that `ArraySpan` is non-owning, isn't it part of the API contract that you have to make sure the backing object does not fall out of scope?
   
   We should probably make this clearer in the docstrings, though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1553268646

   ```cpp
   return span; // Undefined Behavior; the returned copy views
                // a stack variable whose lifetime has ended.
   ```
   
   We can check if the buffers are pointing to scratch space and re-wire them if they are pointing to the previous value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1553330957

   ```cpp
   ArraySpan span;
   span.FillFromScalar(scalar);
   
   UseSpan(span); // Fine; the original span is still alive.
   
   return span; // Undefined Behavior; the returned copy views
                // a stack variable whose lifetime has ended.
    ```
    
    Due to RVO, the returned `ArraySpan` is constructed at the caller's stack and this is not biting us, I guess. I think this is a matter of always using destination-passing style when dealing with `ArraySpan` and never copying or moving it.
    
    Copy and Move constructors haven't been `= delete;`d in the class, I wonder if they could be still. I think we are allowed to do so as it's laveled `EXPERIMENTAL`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1630887934

   I am moving this to 14.0.0 as it is not marked as a blocker for the 13.0.0 release, let me know if it should be part of the release though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou closed issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou closed issue #35581: [C++] ArraySpan::FillFromScalar is unsafe
URL: https://github.com/apache/arrow/issues/35581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1554722053

   Scary. And as I thought more about my proposal of deleting the copy ctor of `ArraySpan`, I realized that completely defeats the point of having a non-owning span type.
   
   If we really care about small buffers being cheap, we should instead consider changing the `Buffer` class to have an optimization similar to the small-string optimization that `std::string` does. That can be made safe by correct implementation of the move/copy constructors of `Buffer` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz commented on issue #35581: [C++] ArraySpan::FillFromScalar is unsafe

Posted by "bkietz (via GitHub)" <gi...@apache.org>.
bkietz commented on issue #35581:
URL: https://github.com/apache/arrow/issues/35581#issuecomment-1553635755

   This is not biting us currently because AFAICT the only place we use FillFromScalar is https://github.com/apache/arrow/blob/cd6e2a4d2b9373b942da18b4cc82cb41431764d9/cpp/src/arrow/compute/exec.cc#L319-L330 where they are in a `std::vector` from which they are never moved. This is subtle too: as long as the original ArraySpan remains valid copies will be valid too, so
   
   ```c++
   ArraySpan Thing::get(int i) { return vector_of_spans_[i]; }
   ```
   
   Might not segfault if the `Thing` happens to be kept alive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org