You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Paul Taylor (Jira)" <ji...@apache.org> on 2020/04/08 01:19:00 UTC

[jira] [Commented] (ARROW-8053) [JS] Improve performance of filtering

    [ https://issues.apache.org/jira/browse/ARROW-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077728#comment-17077728 ] 

Paul Taylor commented on ARROW-8053:
------------------------------------

[~hulettbh] did the predicates stuff. It could certainly be more optimized if it JIT'd into a flat JS function.

An apples-to-apples comparison would be to filter the rows individually:

{code:javascript}
function filterStruct(struct, predicate) {
  let keys = [], i = -1, j = -1;
  for (let row of table) if (predicate(row, ++i)) keys[++j] = i;
  return DictionaryVector.from(struct, new Int32(), keys)
}

function predicate(policy) {
  return policy.proto === 6
      && ((policy.startPort > 0 && policy.endPort < 200) || policy.startPort === 49152)
      && policy.isActive === true;
}

const count = filterStruct(policiesTable, pred).length
{code}

I generally agree with [~lmeyerov] though, don't do inline scans and reductions if you care about performance. Use WASM/web workers to distrubute across CPU cores, or (better yet) WebGL TransformFeedback on the GPU (both work in node and browsers, neither require non-JS dependencies). 
 Arrow excels at both of these strategies.

> [JS] Improve performance of filtering
> -------------------------------------
>
>                 Key: ARROW-8053
>                 URL: https://issues.apache.org/jira/browse/ARROW-8053
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: JavaScript
>            Reporter: Will Strimling
>            Priority: Major
>
> A series of observable notebooks have shown quite convincingly that arrow doesn't compete with other libraries or JavaScript when it comes to filtering performance. Has there been any discussion or roadmaps established for improving it?
> Most convincing Observables:
>  * [https://observablehq.com/@duaneatat/apache-arrow-filtering-vs-array-filter]
>  * [https://observablehq.com/@robertleeplummerjr/array-filtering-apache-arrow-vs-gpu-js-textures-vs-array-fil]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)