You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2019/09/17 23:45:00 UTC

[jira] [Commented] (ARROW-6583) [Rust] Question and Request for Examples of Array Operations

    [ https://issues.apache.org/jira/browse/ARROW-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931920#comment-16931920 ] 

Andy Grove commented on ARROW-6583:
-----------------------------------

Hi Arthur,

You approach looks functionality correct but rather than building a boolean array and then calling the filter method, it might be simpler and more efficient to just build the new Float64Array directly in your code, but it depends on your use case I guess.

You might also be interested in looking at the code in {{rust/arrow/src/compute/kernels/comparison.rs}}  where there are methods that take advantage of SIMD for comparing arrays (but not for comparing arrays to literals yet). For example we have {{>=}} implemented with this method:

 
{code:java}
/// Perform `left >= right` operation on two arrays. Non-null values are greater than null
/// values.
pub fn gt_eq<T>(
    left: &PrimitiveArray<T>,
    right: &PrimitiveArray<T>,
) -> Result<BooleanArray>
 {code}
To answer your last question, I would say the goals of the Rust project are:
 # Allow interop with other Arrow implementations
 # Provide efficient compute kernels for various operations (some basic ones exist already but I think more will be added over time)

In addition to the core Arrow implementation in Rust, there is also the DataFusion crate, which is implementing a SQL query engine using Arrow, supporting query execution against CSV and Parquet files.

I hope that helps.

> [Rust] Question and Request for Examples of Array Operations
> ------------------------------------------------------------
>
>                 Key: ARROW-6583
>                 URL: https://issues.apache.org/jira/browse/ARROW-6583
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Arthur Maciejewicz
>            Priority: Minor
>
> Hi all, thank you for your excellent work on Arrow.
> As I was going through the example for the Rust Arrow implementation, specifically the read_csv example [https://github.com/apache/arrow/blob/master/rust/arrow/examples/read_csv.rs] , as well as the generated Rustdocs, and unit tests, it was not quite clear what the intended usage is for operations such as filtering and masking over Arrays.
> One particular use-case I'm interested in is finding all values in an Array such that x >= N for all x. I came across arrow::compute::array_ops::filter, which seems to be similar to what I want, although it's expecting a mask to already be constructed before performing the filter operation, and it was not obviously visible in the documentation, leading me to believe this might not be idiomatic usage.
> More generally, is the expectation for Arrays on the Rust side that they are just simple data abstractions, without exposing higher-order methods such as filtering/masking? Is the intent to leave that to users? If I missed some piece of documentation, please let me know. For my use-case I ended up trying something like:
> {code:java}
> let column = batch.column(0).as_any().downcast_ref::<Float64Array>().unwrap();
> let mut builder = BooleanBuilder::new(batch.num_rows());
> let N = 5.0;
> for i in 0..batch.num_rows() {
>    if column.value(i).unwrap() > N {
>       builder.append_value(true).unwrap();
>    } else {
>       builder.append_value(false).unwrap();
>    }
> }
> let mask = builder.finish();
> let filtered_column = filter(column, mask);{code}
> If possible, could you provide examples of intended usage of Arrays? Thank you!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)