You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/29 16:30:57 UTC

[GitHub] [arrow-rs] alamb opened a new issue, #4466: Request: a way to copy a `Row` to `Rows`

alamb opened a new issue, #4466:
URL: https://github.com/apache/arrow-rs/issues/4466

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I am implementing GroupByHash in DataFusion https://github.com/apache/arrow-datafusion/issues/4973
   
   We use the `RowFormat` to store grouping keys which is awesome. 
   
   The Grouping operation calculates the `Row` format for each input row, determines if they have been seen previously, and if not stores the newly seen `Row`. The only way I know of today is to copy each new row individually using [`owned()`](https://docs.rs/arrow-row/42.0.0/arrow_row/struct.Row.html#method.owned):
   
   ```
   ┌──────────────────────────────────┐                                                            
   │ ┌───────────────────────────────┐│                                                            
   │ │               A               ││                                                            
   │ ├───────────────────────────────┤│                                                            
   │ │               B               │├────────────┐                                               
   │ ├───────────────────────────────┤│            │                                               
   │ │               A               ││            │                                               
   │ ├───────────────────────────────┤│            │                                               
   │ │               A               ││            │           ┌──────────────────────────────────┐
   │ ├───────────────────────────────┤│            │           │ ┌───────────────────────────────┐│
   │ │               C               ││            │           │ │               A               ││
   │ ├───────────────────────────────┤│            │           │ └───────────────────────────────┘│
   │ │               B               ││            │           │ ┌───────────────────────────────┐│
   │ ├───────────────────────────────┤│            └───────────┼▶│               B               ││
   │ │               A               ││                        │ └───────────────────────────────┘│
   │ ├───────────────────────────────┤│  to add a new row, I   │                                  │
   │ │               A               ││  currently do          │                                  │
   │ └───────────────────────────────┘│  `Row::owned()` to     │                                  │
   │  group keys for input batch      │  get a copy            │   distinct group keys seen in    │
   │  often many repeated values      │                        │   previous batches               │
   │                                  │                        │                                  │
   └──────────────────────────────────┘                        └──────────────────────────────────┘
                                                                                                   
        arrow_row::Rows                                         Vec<arrow_row::OwnedRow>           
                                                                                                   
   ```
   
   **Describe the solution you'd like**
   I would like to be able to append a `Row` directly to a `Rows`:
   
   ```
   ┌──────────────────────────────────┐                                                            
   │ ┌───────────────────────────────┐│                                                            
   │ │               A               ││                                                            
   │ ├───────────────────────────────┤│                                                            
   │ │               B               │├────────────┐                                               
   │ ├───────────────────────────────┤│            │                                               
   │ │               A               ││            │                                               
   │ ├───────────────────────────────┤│            │                                               
   │ │               A               ││            │           ┌──────────────────────────────────┐
   │ ├───────────────────────────────┤│            │           │ ┌───────────────────────────────┐│
   │ │               C               ││            │           │ │               A               ││
   │ ├───────────────────────────────┤│            │           │ ├───────────────────────────────┤│
   │ │               B               ││            └───────────┼▶│               B               ││
   │ ├───────────────────────────────┤│                        │ └───────────────────────────────┘│
   │ │               A               ││                        │                                  │
   │ ├───────────────────────────────┤│  Copying a new Row     │                                  │
   │ │               A               ││  would just copy       │                                  │
   │ └───────────────────────────────┘│  some bytes to the     │                                  │
   │  group keys for input batch      │  other Rows            │   distinct group keys seen in    │
   │  often many repeated values      │                        │   previous batches               │
   │                                  │                        │                                  │
   └──────────────────────────────────┘                        └──────────────────────────────────┘
                                                                                                   
      arrow_row::Rows                                            arrow_row::Rows                   
                                                                                                   
   ```
   
   **Describe alternatives you've considered**
   
   Currently my POC code uses `Vec<OwnedRow>` which adds an extra allocation for each row 😢 
   
   **Additional context**
   https://github.com/apache/arrow-datafusion/issues/4973


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #4466: Request: a way to copy a `Row` to `Rows`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4466: Request: a way to copy a `Row` to `Rows`
URL: https://github.com/apache/arrow-rs/issues/4466


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org