You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/03 11:41:37 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #1708: Introduce a `Vec` based row-wise representation for DataFusion

alamb commented on issue #1708:
URL: https://github.com/apache/arrow-datafusion/issues/1708#issuecomment-1028906123


   💯  with what @Dandandan  and @houqp  said; Thank you for writing this up @yjshen ❤️ 
   
   > I am wondering if for certain operations, e.g. hash aggregate, I feel fixed
   size input the data is stored better in a columnar format (mutable array,
   with offsets),
   
   I agree with @Dandandan  that for HashAggregate this would be super helpful -- as the group keys and aggregates could be computed "in place" (so output was free)
   
   Sorting is indeed different because the sort key is different than what appears in the output. For example `SELECT a, b, c ... ORDER by a+b` needs to compare on `a+b`, but still produce tuples of `(a, b, c)`;
   
   The grouping values are produced. For example `SELECT a+b, sum(c) .. GROUP BY a+b` produces tuples of `(a+b, sum)`
   
   
   p.s. for what it is worth I think DuckDB has a short string optimization so the key may look something more like
   
   
   ```text
   Table A (bool a, char b, int c, string d) row_value (true, 'W', 59, "XYZ")                
                                                                                             
                                                                                             
                                                                                             
          ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐       
          │ 0F │ 1  │ W  │ 00 │ 00 │ 00 │ 3B │ 03 │ 00 │ 00 │ 00 │ 00 │ X  │ Y  │ Z  │       
          └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘       
                                                                                             
                                                  8                                          
                                                                                             
                                                                                             
                                                                                             
   Table A (bool a, char b, int c, string d) row_value (true, 'W', 59, "XYZXYZXYZ")          
                                                                                             
          ┌────┬────┬────┬────┬────┬────┬────┬─────────────────────────────────────────────┐ 
          │ 0F │ 1  │ W  │ 00 │ 00 │ 00 │ 3B │                     PTR                     │ 
          └────┴────┴────┴────┴────┴────┴────┴─────────────────────────────────────────────┘ 
                                                                    │                        
                                                  8                 └───┐                    
                                                                        ▼                    
                                                                                             
                                                                   "XYZXYZXYZ"               
                                                                                             
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org