You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/09/05 07:27:09 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

jorgecarleitao opened a new pull request #8116:
URL: https://github.com/apache/arrow/pull/8116


   This PR improves speed of all math operations by 15% (on default batch size), and changes the scaling (to faster) of how these operations scale with batch size.
   
   See main issue here: ARROW-9918
   
   This has a big contribution from @yordan-pavlov, that initially reported the issue on ARROW-8908 for literal arrays.
   
   Benchmarks (on my computer, before and after the second commit):
   
   ```
   sqrt_20_12              time:   [34.422 ms 34.503 ms 34.584 ms]                       
                           change: [-16.333% -16.055% -15.806%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   sqrt_22_12              time:   [150.13 ms 150.79 ms 151.42 ms]                       
                           change: [-16.281% -15.488% -14.779%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   sqrt_22_14              time:   [151.45 ms 151.68 ms 151.90 ms]                       
                           change: [-18.233% -16.919% -15.888%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484750341



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       Done, found a cleaner way of getting the null buffer in the end




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-688034629


   @paddyhoran , thanks a lot for the feedback, it really helps to understand a bit better the context, as I am not aware of those practices.
   
   I do think that both builders and `from` use `memory.rs`. For primitive types, 
   
   * the builder uses a `BufferBuilder` and `BooleanBufferBuilder`, which in turn uses `memory` [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/array/builder.rs#L463)
   * the `from` uses `Buffer`, which uses `memory` [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/array/array.rs#L737)
   
   There are two main differences between the two:
   
   * A) `from` requires an intermediary allocation (to vec), while the builder does not (uses a MutableBuffer).
   * B) `builder` assess whether it needs to change capacity on every item, while `from` does not.
   
   Specifically, `from` uses `Buffer::from(data.to_byte_slice())`, while the `builder` uses `for datum in data builder.append(datum); builder.finish()`. Therefore, `builders` perform checks on its capacity to ensure that each `append` does not require a `resize` (see impl of `append`, `BufferBuilderTrait<T> for BufferBuilder`, that has a `reserve(1)?` on it, and `advance`, used in nulls, that has a `resize` on it).
   
   My hypothesis for this performance change is that B) is more CPU intensive than A).
   
   One the solution is to offer an API to the `Builder`s to not perform those checks when the user already set the capacity. This would address `B)` and would avoid an intermediary allocation.
   
   In this direction, one idea is to support appending from an `ExactSizeIterator` (e.g. `append_iter`) and use `size_hint` to not perform bound checks on every item.
   
   -----------------------
   
   A common operation we have in the readers and compute is
   
   ```
   rows: &[Value];
   
   for row in rows {
       match some_operation(row) {
            Some(item) => builder.append(item)
            None => builder.append_null()
       }
   }
   ```
   
   which is a basically an `ExactSizeIterator`. However, in the example above, the `builder` has no way of knowing that the new append won't go beyond the reserved size (and thus it checks).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484493616



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       Here's what I'm getting if I remove the loop. A further 85-90% improvement on top of your change (on Ryzen 3700x).
   
   The change requires making 1 function and 1 field public though, so maybe we can find some way of working around that.
   
   ```rust
   sqrt_20_12              time:   [2.5142 ms 2.6041 ms 2.6987 ms]
                           change: [-90.546% -90.184% -89.809%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   sqrt_22_12              time:   [15.254 ms 15.532 ms 15.817 ms]
                           change: [-85.989% -85.707% -85.436%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   sqrt_22_14              time:   [12.698 ms 12.919 ms 13.158 ms]
                           change: [-87.515% -87.306% -87.069%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   ```
   
   Code
   
   ```rust
   macro_rules! compute_op {
       ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
           let len = $ARRAY.len();
           let result = (0..len)
               .map(|i| $ARRAY.value(i).$FUNC() as f64)
               .collect::<Vec<f64>>();
           let data = ArrayData::new(
               DataType::Float64,
               len,
               None,
               // this is nasty, not sure of how to do it better, maybe we could introduce a fn that does this?
               $ARRAY
                   .data()
                   .null_bitmap()
                   .clone()
                   .map(|bitmap| bitmap.bits),
               0,
               vec![Buffer::from(result.to_byte_slice())],
               vec![],
           );
           Ok(make_array(Arc::new(data)))
       }};
   }
   ``` 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] andygrove commented on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

andygrove commented on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-687635427


   Wow, very cool.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r483971257



##########
File path: rust/datafusion/benches/math_query_sql.rs
##########
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#[macro_use]
+extern crate criterion;
+use criterion::Criterion;
+
+use std::sync::Arc;
+
+extern crate arrow;
+extern crate datafusion;
+
+use arrow::{
+    array::{Float32Array, Float64Array},
+    datatypes::{DataType, Field, Schema},
+    record_batch::RecordBatch,
+};
+use datafusion::error::Result;
+
+use datafusion::datasource::MemTable;
+use datafusion::execution::context::ExecutionContext;
+
+fn query(ctx: &mut ExecutionContext, sql: &str) {
+    // execute the query
+    let df = ctx.sql(&sql).unwrap();
+    let results = df.collect().unwrap();
+
+    // display the relation
+    for _batch in results {}
+}
+
+fn create_context(array_len: usize, batch_size: usize) -> Result<ExecutionContext> {
+    // define a schema.
+    let schema = Arc::new(Schema::new(vec![
+        Field::new("f32", DataType::Float32, false),
+        Field::new("f64", DataType::Float64, false),
+    ]));
+
+    // define data.
+    let batches = (0..array_len / batch_size)
+        .map(|i| {
+            RecordBatch::try_new(
+                schema.clone(),
+                vec![
+                    Arc::new(Float32Array::from(vec![i as f32; batch_size])),
+                    Arc::new(Float64Array::from(vec![i as f64; batch_size])),
+                ],
+            )
+            .unwrap()
+        })
+        .collect::<Vec<_>>();
+
+    let mut ctx = ExecutionContext::new();
+
+    // declare a table in memory. In spark API, this corresponds to createDataFrame(...).
+    let provider = MemTable::new(schema, vec![batches])?;
+    ctx.register_table("t", Box::new(provider));
+
+    Ok(ctx)
+}
+
+fn criterion_benchmark(c: &mut Criterion) {
+    c.bench_function("sqrt_20_12", |b| {

Review comment:
       I agree. I will reduce sizes. I needed them for the perform the scaling, which often only shows up in larger due to fixed costs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] paddyhoran commented on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

paddyhoran commented on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-687998129


   I mentioned this over on [ARROW-9921](https://github.com/apache/arrow/pull/8117) but I don't think we intended to use `From<Vec<_>>` for anything other that testing originally.
   
   I thought that we needed to use the functions from [memory](https://github.com/apache/arrow/blob/master/rust/arrow/src/memory.rs) to allocate (control alignment and padding) but this allows the `Vec` to allocate (via `collect`).  I think you would want to ensure that all arrays are allocated with a consistent alignment, i.e. use memory.rs.
   
   From the spec:
   > Implementations are recommended to allocate memory on aligned addresses (multiple of 8- or 64-bytes) and pad (overallocate) to a length that is a multiple of 8 or 64 bytes. When serializing Arrow data for interprocess communication, these alignment and padding requirements are enforced
   
   This approach might be fine for an application in the wild (that won't use IPC) but DataFusion is part of the Arrow project itself and so *should* follow the rules/recommendations, thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao edited a comment on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-688034629


   @paddyhoran , thanks a lot for the feedback, it really helps to understand a bit better the context, as I am not aware of those practices.
   
   I do think that both builders and `from` use `memory.rs`. For primitive types, 
   
   * the builder uses a `BufferBuilder` and `BooleanBufferBuilder`, which in turn uses `memory` [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/array/builder.rs#L463)
   * the `from` uses `Buffer`, which uses `memory` [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/array/array.rs#L737)
   
   There are two main differences between the two:
   
   * A) `from` requires an intermediary allocation (to vec), while the builder does not (uses a MutableBuffer).
   * B) `builder` assess whether it needs to change capacity on every item, while `from` does not.
   
   Specifically, `from` uses `Buffer::from(data.to_byte_slice())`, while the `builder` uses `for datum in data builder.append(datum); builder.finish()`. Therefore, `builders` perform checks on its capacity to ensure that each `append` does not require a `resize` (see impl of `append`, `BufferBuilderTrait<T> for BufferBuilder`, that has a `reserve(1)?` on it, and `advance`, used in nulls, that has a `resize` on it).
   
   My hypothesis for this performance change is that B) is more CPU intensive than A).
   
   One solution is to offer an API to the `Builder`s to not perform those checks when the user already set the capacity. This would address `B)` and would avoid an intermediary allocation.
   
   In this direction, one idea is to support appending from an `ExactSizeIterator` (e.g. `append_iter`) and use `size_hint` to not perform bound checks on every item.
   
   -----------------------
   
   A common operation we have in the readers and compute is
   
   ```
   rows: &[Value];
   
   for row in rows {
       match some_operation(row) {
            Some(item) => builder.append(item)
            None => builder.append_null()
       }
   }
   ```
   
   which is a basically an `ExactSizeIterator`. However, in the example above, the `builder` has no way of knowing that the new append won't go beyond the reserved size (and thus it checks).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484498652



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       ![Shut up and take my money](https://static.wikia.nocookie.net/memepediadankmemes/images/8/80/Acb.jpg/revision/latest/scale-to-width-down/340?cb=20180822064733)
   
   Now, seriously, I think that we should merge this change instead of mine (maybe keep the benchmark commit).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r483971257



##########
File path: rust/datafusion/benches/math_query_sql.rs
##########
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#[macro_use]
+extern crate criterion;
+use criterion::Criterion;
+
+use std::sync::Arc;
+
+extern crate arrow;
+extern crate datafusion;
+
+use arrow::{
+    array::{Float32Array, Float64Array},
+    datatypes::{DataType, Field, Schema},
+    record_batch::RecordBatch,
+};
+use datafusion::error::Result;
+
+use datafusion::datasource::MemTable;
+use datafusion::execution::context::ExecutionContext;
+
+fn query(ctx: &mut ExecutionContext, sql: &str) {
+    // execute the query
+    let df = ctx.sql(&sql).unwrap();
+    let results = df.collect().unwrap();
+
+    // display the relation
+    for _batch in results {}
+}
+
+fn create_context(array_len: usize, batch_size: usize) -> Result<ExecutionContext> {
+    // define a schema.
+    let schema = Arc::new(Schema::new(vec![
+        Field::new("f32", DataType::Float32, false),
+        Field::new("f64", DataType::Float64, false),
+    ]));
+
+    // define data.
+    let batches = (0..array_len / batch_size)
+        .map(|i| {
+            RecordBatch::try_new(
+                schema.clone(),
+                vec![
+                    Arc::new(Float32Array::from(vec![i as f32; batch_size])),
+                    Arc::new(Float64Array::from(vec![i as f64; batch_size])),
+                ],
+            )
+            .unwrap()
+        })
+        .collect::<Vec<_>>();
+
+    let mut ctx = ExecutionContext::new();
+
+    // declare a table in memory. In spark API, this corresponds to createDataFrame(...).
+    let provider = MemTable::new(schema, vec![batches])?;
+    ctx.register_table("t", Box::new(provider));
+
+    Ok(ctx)
+}
+
+fn criterion_benchmark(c: &mut Criterion) {
+    c.bench_function("sqrt_20_12", |b| {

Review comment:
       I agree. I will reduce samples. I needed them for the perform the scaling, which often only shows up in larger due to fixed costs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me closed pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me closed pull request #8116:
URL: https://github.com/apache/arrow/pull/8116


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-688394965


   @paddyhoran @jorgecarleitao @andygrove  PTAL at my suggestion before merging (https://github.com/apache/arrow/pull/8116#discussion_r484493616)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] paddyhoran commented on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

paddyhoran commented on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-688252444


   Hey @jorgecarleitao,
   
   Sorry, my info was wrong.  As you said `<From<Vec<_>>` does do a full copy of the underlying data which uses memory.rs [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/buffer.rs#L274).  I was not aware of this.  
   
   I think you are right that there are a lot of improvements to be made to the builders, we should be able to get better performance out of them as they should be able to act like a `Vec` where the correct alignment is enforced so that no copy is needed.
   
   Anyway, sorry for the noise and thank you for the detailed explanation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484110960



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       @nevi-me, doesn't the performance depend on the density of nulls? For low density, vectorise wins, for high density, loop wins.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484544619



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       Sure, Let's do it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484534336



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       I can push my changes if you don't mind cleaning them up




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

jorgecarleitao edited a comment on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-688034629


   @paddyhoran , thanks a lot for the feedback, it really helps to understand a bit better the context, as I am not aware of those practices.
   
   I do think that both builders and `from` use `memory.rs`. For primitive types, 
   
   * the builder uses a `BufferBuilder` and `BooleanBufferBuilder`, which in turn uses `memory` [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/array/builder.rs#L463)
   * the `from` uses `Buffer`, which uses `memory` [here](https://github.com/apache/arrow/blob/master/rust/arrow/src/array/array.rs#L737)
   
   There are two main differences between the two:
   
   * A) `from` requires an intermediary allocation (to vec), while the builder does not (uses a MutableBuffer).
   * B) `builder` assess whether it needs to change capacity on every item, while `from` does not.
   
   Specifically, `from` uses `Buffer::from(data.to_byte_slice())`, while the `builder` uses `for datum in data builder.append(datum); builder.finish()`. Therefore, `builders` perform checks on its capacity to ensure that each `append` does not require a `resize` (see impl of `append`, `BufferBuilderTrait<T> for BufferBuilder`, that has a `reserve(1)?` on it, and `advance`, used in nulls, that has a `resize` on it).
   
   My hypothesis for this performance change is that B) is more CPU intensive than A).
   
   One solution is to offer an API to the `Builder`s to not perform those checks when the user already set the capacity. This would address `B)` and would avoid an intermediary allocation.
   
   In this direction, one idea is to support appending from an `ExactSizeIterator` (e.g. `append_iter`) and use `size_hint` to not perform bound checks on every item.
   
   -----------------------
   
   A common operation we have in the readers is
   
   ```
   rows: &[Value];
   
   for row in rows {
       match some_operation(row) {
            Some(item) => builder.append(item)
            None => builder.append_null()
       }
   }
   ```
   
   in compute
   
   ```
   rows: &[Value];
   
   for row in rows {
       match row {
            Some(item) => builder.append(some_operation(item))
            None => builder.append_null()
       }
   }
   ```
   
   (with a different syntax, but the logic is this one)
   
   which is a basically an `ExactSizeIterator`. However, in the example above, the `builder` has no way of knowing that the new append won't go beyond the reserved size (and thus it checks).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] alamb commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

alamb commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r483940437



##########
File path: rust/datafusion/benches/math_query_sql.rs
##########
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#[macro_use]
+extern crate criterion;
+use criterion::Criterion;
+
+use std::sync::Arc;
+
+extern crate arrow;
+extern crate datafusion;
+
+use arrow::{
+    array::{Float32Array, Float64Array},
+    datatypes::{DataType, Field, Schema},
+    record_batch::RecordBatch,
+};
+use datafusion::error::Result;
+
+use datafusion::datasource::MemTable;
+use datafusion::execution::context::ExecutionContext;
+
+fn query(ctx: &mut ExecutionContext, sql: &str) {
+    // execute the query
+    let df = ctx.sql(&sql).unwrap();
+    let results = df.collect().unwrap();
+
+    // display the relation
+    for _batch in results {}
+}
+
+fn create_context(array_len: usize, batch_size: usize) -> Result<ExecutionContext> {
+    // define a schema.
+    let schema = Arc::new(Schema::new(vec![
+        Field::new("f32", DataType::Float32, false),
+        Field::new("f64", DataType::Float64, false),
+    ]));
+
+    // define data.
+    let batches = (0..array_len / batch_size)
+        .map(|i| {
+            RecordBatch::try_new(
+                schema.clone(),
+                vec![
+                    Arc::new(Float32Array::from(vec![i as f32; batch_size])),
+                    Arc::new(Float64Array::from(vec![i as f64; batch_size])),
+                ],
+            )
+            .unwrap()
+        })
+        .collect::<Vec<_>>();
+
+    let mut ctx = ExecutionContext::new();
+
+    // declare a table in memory. In spark API, this corresponds to createDataFrame(...).
+    let provider = MemTable::new(schema, vec![batches])?;
+    ctx.register_table("t", Box::new(provider));
+
+    Ok(ctx)
+}
+
+fn criterion_benchmark(c: &mut Criterion) {
+    c.bench_function("sqrt_20_12", |b| {

Review comment:
       BTW these benchmarks were taking ~ 200s on my laptop -- I think you can probably show a similar improvement with smaller dataset sizes (aka I think we should reduce the number of iterations and/or datasize so the benchmarks run in ~10sec - 20sec)

##########
File path: rust/datafusion/benches/math_query_sql.rs
##########
@@ -0,0 +1,100 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#[macro_use]
+extern crate criterion;
+use criterion::Criterion;
+
+use std::sync::Arc;
+
+extern crate arrow;
+extern crate datafusion;
+
+use arrow::{
+    array::{Float32Array, Float64Array},
+    datatypes::{DataType, Field, Schema},
+    record_batch::RecordBatch,
+};
+use datafusion::error::Result;
+
+use datafusion::datasource::MemTable;
+use datafusion::execution::context::ExecutionContext;
+
+fn query(ctx: &mut ExecutionContext, sql: &str) {
+    // execute the query
+    let df = ctx.sql(&sql).unwrap();
+    let results = df.collect().unwrap();
+
+    // display the relation
+    for _batch in results {}
+}
+
+fn create_context(array_len: usize, batch_size: usize) -> Result<ExecutionContext> {
+    // define a schema.
+    let schema = Arc::new(Schema::new(vec![
+        Field::new("f32", DataType::Float32, false),
+        Field::new("f64", DataType::Float64, false),
+    ]));
+
+    // define data.
+    let batches = (0..array_len / batch_size)
+        .map(|i| {
+            RecordBatch::try_new(
+                schema.clone(),
+                vec![
+                    Arc::new(Float32Array::from(vec![i as f32; batch_size])),
+                    Arc::new(Float64Array::from(vec![i as f64; batch_size])),
+                ],
+            )
+            .unwrap()
+        })
+        .collect::<Vec<_>>();
+
+    let mut ctx = ExecutionContext::new();
+
+    // declare a table in memory. In spark API, this corresponds to createDataFrame(...).
+    let provider = MemTable::new(schema, vec![batches])?;
+    ctx.register_table("t", Box::new(provider));
+
+    Ok(ctx)
+}
+
+fn criterion_benchmark(c: &mut Criterion) {
+    c.bench_function("sqrt_20_12", |b| {

Review comment:
       ```
   alamb@ip-192-168-1-129:~/Software/arrow/rust$ ARROW_TEST_DATA=`pwd`/../testing/data cargo bench --bench math_query_sql
       Finished bench [optimized] target(s) in 0.15s
        Running target/release/deps/math_query_sql-8428878a7376985f
   Gnuplot not found, using plotters backend
   Benchmarking sqrt_20_12: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 179.8s or reduce sample count to 10.
   Benchmarking sqrt_20_12: Collecting 100 samples in estimated 179.78 s (5050 iterations)
   ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#issuecomment-687567384


   https://issues.apache.org/jira/browse/ARROW-9919


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r483954060



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       You could likely improve performance further by computing without checking nullity of array, and then building the array using the input bitmask.
   
   I'm away from home, so I don't have a PC to check with; but look at the pattern that most of the compute functions follow.
   
   Removing the if/else allows the compiler to vectorise the loop.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on a change in pull request #8116: ARROW-9919: [Rust][DataFusion] Speedup math operations by 15%+

Posted by GitBox <gi...@apache.org>.

nevi-me commented on a change in pull request #8116:
URL: https://github.com/apache/arrow/pull/8116#discussion_r484477474



##########
File path: rust/datafusion/src/physical_plan/math_expressions.rs
##########
@@ -19,23 +19,25 @@
 
 use crate::error::{ExecutionError, Result};
 
-use arrow::array::{Array, ArrayRef, Float32Array, Float64Array, Float64Builder};
+use arrow::array::{Array, ArrayRef, Float32Array, Float64Array};
 
 use arrow::datatypes::DataType;
 
 use std::sync::Arc;
 
 macro_rules! compute_op {
     ($ARRAY:expr, $FUNC:ident, $TYPE:ident) => {{
-        let mut builder = Float64Builder::new($ARRAY.len());
-        for i in 0..$ARRAY.len() {
-            if $ARRAY.is_null(i) {
-                builder.append_null()?;
-            } else {
-                builder.append_value($ARRAY.value(i).$FUNC() as f64)?;
-            }
-        }
-        Ok(Arc::new(builder.finish()))
+        let result: Float64Array = (0..$ARRAY.len())
+            .map(|i| {

Review comment:
       > @nevi-me, doesn't the performance depend on the density of nulls? For low density, vectorise wins, for high density, loop wins.
   
   I'd think that it wouldn't depend on nullness as you'd be removing the branching altogether. I'll have a look




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org