You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/30 15:42:28 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #1981: Copy-On-Write Support

tustvold opened a new issue, #1981:
URL: https://github.com/apache/arrow-rs/issues/1981

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   Certain use-cases may benefit from being able to mutate arrays in place, without copying them. Fortunately we have most of the pieces to support this, it just requires some plumbing work.
   
   **Describe the solution you'd like**
   
   A high-level outline can be found below:
   
   * Add COW conversion from `Buffer` to `MutableBuffer` likely using [`Arc::try_unwrap`](https://doc.rust-lang.org/std/sync/struct.Arc.html#method.try_unwrap)
   * Add COW conversion from `T: Array` to their corresponding builder, i.e. `PrimitiveArray::into_builder`
   * Add conversion from `ArrayRef` to `Arc<T>` - https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=abb7fcf97e33ba83d44b17f6c89e8f0b
   
   It should then be possible to do something like (likely encapsulated in a nicer interface)
   
   ```
   fn cow_builder(col: ArrayRef) -> Int32Builder {
       let col: Arc<Int32Array> = downcast_array(col).unwrap();
       let col: Int32Array = Arc::try_unwrap(col).unwrap_or_else(|_| col.as_ref().clone());
       col.into_builder()
   }
   ```
   
   **Additional context**
   
   Arrow2 recently merged a form of support for this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1340510175

   Sorry I meant into_builder for ByteArray


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1315703265

   I think that is a separate feature, would you mind creating a new issue along with the intended use-case? I suspect it concerns implementing some sort of append only memtable, something I have thoughts on having implemented something similar for IOx (spoiler mem copies are far from the bottleneck).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1368432907

   🎉  thank you @viirya  -- cc @avantgardnerio 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya closed issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
viirya closed issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place
URL: https://github.com/apache/arrow-rs/issues/1981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1339096729

   Theoretically it might be nice to provide `into_buffer` for `ByteArray`, but otherwise I think we can close this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1320923710

   @tustvold @alamb @andygrove I filed an issue with PoC implementation: https://github.com/apache/arrow-rs/issues/3142


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1981: Copy-On-Write Support

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1172745566

   Related arrow2 code: Example https://github.com/jorgecarleitao/arrow2/blob/main/examples/cow.rs#L23
   
   Looks like it was added by @jorgecarleitao in https://github.com/jorgecarleitao/arrow2/pull/1042 (which was a bit hard to find as it is described in terms of the implementation rather than the end user API change) 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1338109082

   For COW, the high-level outline I think we have corresponding implementation now. Is there anything I miss?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1315682551

   Instead of copying-on-write (or maybe in addition to), how would we feel about an `AppendOnlyArrayBuilder` that never calls any method that would cause `MutableBuffer` to `realloc()` and move any pointers?
   
   If we had that, we could have `AppendableRecordBatch`s, and a `borrow() -> &RecordBatch` (as opposed to `.finish()`) that would allow use to append (but not mutate) data, while still being able to do all the fun stuff we can do with `RecordBatch`es today (query in DataFusion, export to parquet, etc).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on issue #1981: Copy-On-Write Support to Support Mutating/Updating Arrays in Place

Posted by GitBox <gi...@apache.org>.
viirya commented on issue #1981:
URL: https://github.com/apache/arrow-rs/issues/1981#issuecomment-1340497465

   For `into_buffer`, do you mean a `MutableBuffer` of a `ByteArray` so we can write `ByteArrayType` data into it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org