You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/10 10:54:16 UTC

[GitHub] [arrow-datafusion] jhorstmann opened a new pull request #1975: Avoid an Arc::clone per row in benchmark

jhorstmann opened a new pull request #1975:
URL: https://github.com/apache/arrow-datafusion/pull/1975


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1973.
   
    # Rationale for this change
   
   Slightly improves the performance of writing rows.
   
   # What changes are included in this PR?
   
   To avoid cloning the `SchemaRef` we pass in the schema as a separate parameter. I also marked the benchmark functions as `inline(never)` so that they stand out more in the profiler, since they are operating on large chunks of data this should not create any overhead.
   
   Benchmark results on i7-10510U, run with `$ RUSTFLAGS="-C target-cpu=skylake" cargo bench --features row,jit --bench jit`:
   
   master branch:
   ```
   row serializer          time:   [2.0518 s 2.0745 s 2.1029 s]                              
   row serializer jit      time:   [1.8530 s 1.8626 s 1.8723 s]                                  
   ```
   
   this branch:
   
   ```
   row serializer          time:   [1.6923 s 1.7042 s 1.7161 s]                              
   row serializer jit      time:   [1.8468 s 1.8562 s 1.8657 s]                                  
   ```
   
   If I understand the code correctly then the jit calls the same `write_field_xyz` functions as the rust version and is not able to inline these functions. So it avoids the type dispatch, but instead has several more function calls than the rust code (which is able to inline some of the `write_field` functions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #1975: Avoid an Arc::clone per row in benchmark

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #1975:
URL: https://github.com/apache/arrow-datafusion/pull/1975#issuecomment-1064787260


   Good catch :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp merged pull request #1975: Avoid an Arc::clone per row in benchmark

Posted by GitBox <gi...@apache.org>.
houqp merged pull request #1975:
URL: https://github.com/apache/arrow-datafusion/pull/1975


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen commented on pull request #1975: Avoid an Arc::clone per row in benchmark

Posted by GitBox <gi...@apache.org>.
yjshen commented on pull request #1975:
URL: https://github.com/apache/arrow-datafusion/pull/1975#issuecomment-1064724348


   Cc @alamb @houqp You may also be interested in this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] yjshen commented on pull request #1975: Avoid an Arc::clone per row in benchmark

Posted by GitBox <gi...@apache.org>.
yjshen commented on pull request #1975:
URL: https://github.com/apache/arrow-datafusion/pull/1975#issuecomment-1067481669


   After searching and discussing with @houqp, it seems complicated to make `cranelift` to [inline rust function into JIT code](https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/Inlining.20external.20rust.20functions/near/237438866). I want to try LLVM out with both assembly and IR inline capabilities. I will report here if I make some progress.
   
   Quote Postgres JIT docs here: 
   
   > One big advantage of JITing expressions is that it can significantly
   reduce the overhead of PostgreSQL's extensible function/operator
   mechanism, by inlining the body of called functions/operators.
   
   > It obviously is undesirable to maintain a second implementation of
   commonly used functions, just for inlining purposes. Instead we take
   advantage of the fact that the Clang compiler can emit LLVM IR.
   
   > The ability to do so allows us to get the LLVM IR for all operators
   (e.g. int8eq, float8pl etc), without maintaining two copies.  These
   bitcode files get installed into the server's
     $pkglibdir/bitcode/postgres/
   Using existing LLVM functionality (for parallel LTO compilation),
   additionally an index is over these is stored to
   $pkglibdir/bitcode/postgres.index.bc
   
   https://github.com/postgres/postgres/blob/7e12256b478b89518ff410f29192af21de37d070/src/backend/jit/README#L192-L219


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org