You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/03/17 21:45:41 UTC

[jira] [Commented] (ARROW-649) Explore a Weld/Arrow converter

    [ https://issues.apache.org/jira/browse/ARROW-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930778#comment-15930778 ] 

Wes McKinney commented on ARROW-649:
------------------------------------

From https://github.com/weld-project/weld/tree/master/python/grizzly, it appears that Weld knows how to operate on contiguous C memory, but I'll have to dig deeper to understand all the details. If that's the case, then building a bridge in C to pass contiguous memory held in Arrow C++ arrays should not be complicated.

As one logistical matter with missing data, Weld may not yet be able to interact with Arrow's validity bitmaps. We'll want to make sure that there's a primitive operator in the Weld DSL (or a plan to implement one) that can handle bitmap propagation in operations.

Looks like Weld does not support null data yet: https://github.com/weld-project/weld/blob/master/python/grizzly/grizzly_impl.py#L285 — so the benchmarks presented aren't exactly apples to apples (having missing data handling in all pandas operations comes at high expense).

I'm also interested to enable Weld to understand Arrow's string memory layout (offsets + data buffers). 

> Explore a Weld/Arrow converter
> ------------------------------
>
>                 Key: ARROW-649
>                 URL: https://issues.apache.org/jira/browse/ARROW-649
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Jacques Nadeau
>
> [~matei] and the Stanford team have just open sourced Weld. It would be interesting to evaluate how we could move Arrow data to Weld's internal representation.
> Weld is here: https://github.com/weld-project/weld



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)