You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2016/12/13 18:23:42 UTC

Re: Weld

Looks like there will be a talk about Weld at Strata next Spring:

http://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/57646

It would be nice to see if there's any work that we can be doing in
Arrow to try to collaborate on such closely-related efforts.

- Wes

On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com> wrote:
> I've always contended that building a sort of "runtime" for in-memory
> column expressions was a natural next step after hardening Arrow as a
> data structure and memory exchange mechanism. I'm hopeful that we'll
> see some cross-pollination between projects like Weld and Arrow to
> produce open source solutions that help drive more collaboration and
> consolidation around shared low-level needs for building analytical
> systems. We still have significant work to do to solidify Arrow as a
> standard -- anyone all reading can do to help work against
> fragmentation in data structures / memory representations would be
> very much appreciated!
>
> - Wes
>
> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com> wrote:
>> I had that in my queue, and your watching and thoughts about it mean I need
>> to watch it today ;)
>>
>> Thanks for the link to where you've played with abstraction before. I'm
>> going to fork a copy and do some playing myself. I'm very interested in how
>> universal it can be, and a library to go from expression to universal
>> intermediate, them output the code in the language of one's choice. The
>> optimizations could be performed in the intermediate layer, the output
>> later, or both if there are specific code optimizations available after
>> translation.
>>
>> -Donald
>>
>> On Nov 22, 2016 2:31 PM, Julien Le Dem
>> wrote:

Re: CLang, LLVM and optimization during conversion

Posted by Wes McKinney <we...@gmail.com>.
We've been developing and testing the Arrow C++ library with Clang,
but haven't done any work on LLVM or code generation with it. It would
be interesting to make it easy to create custom UDFs in LLVM and
evaluate them on Arrow data -- a bit of glue necessary to bridge the
Arrow world and the LLVM world.

On Wed, Dec 14, 2016 at 9:52 AM, Donald Foss <do...@gmail.com> wrote:
> Changing the subject line to (try to) avoid thread confusion.
>
> Has anyone done any work or tests with Arrow and CLang? The way the LLVM
> code conversions work *could* be a natural path, considering that the wheel
> is already round. Just yesterday I was looking at how the LLVM converter to
> JavaScript on v8 produced optimized code that caused fewer GC events, thus
> fewer hits to the render engine in-browser. Just a thought filed under
> Things That Make You Go Hmmm.
>
> --Donald
>
> On Dec 13, 2016 1:23 PM, Wes McKinney
> wrote:

CLang, LLVM and optimization during conversion

Posted by Donald Foss <do...@gmail.com>.
Changing the subject line to (try to) avoid thread confusion.

Has anyone done any work or tests with Arrow and CLang? The way the LLVM
code conversions work *could* be a natural path, considering that the wheel
is already round. Just yesterday I was looking at how the LLVM converter to
JavaScript on v8 produced optimized code that caused fewer GC events, thus
fewer hits to the render engine in-browser. Just a thought filed under
Things That Make You Go Hmmm.

--Donald

On Dec 13, 2016 1:23 PM, Wes McKinney
wrote:

Re: Weld

Posted by Julian Hyde <jh...@apache.org>.
Here's the Weld paper: http://cidrdb.org/cidr2017/papers/p127-palkar-cidr17.pdf

Julian


On Thu, Dec 15, 2016 at 2:06 PM, Wes McKinney <we...@gmail.com> wrote:
> Sounds great, thanks.
>
> A number of people are looking at using Arrow for faster serialization
> for PySpark, so now that the Java and C++ libraries are compatible
> (cf: integration tests) we can make this a reality.
>
> On Thu, Dec 15, 2016 at 4:54 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> I'm happy to reach out to Matei.
>> Reynold is on this list and the Arrow PMC as well.
>> Wes, I can start with an email and CC you.
>>
>>
>>
>> On Thu, Dec 15, 2016 at 11:03 AM, Mark Hamstra <ma...@clearstorydata.com>
>> wrote:
>>
>>> I already made sure that Matei is aware of this thread.  He seemed
>>> interested in talking with key Arrow developers.
>>>
>>> On Thu, Dec 15, 2016 at 10:49 AM, Julian Hyde <jh...@apache.org> wrote:
>>>
>>> > I think someone should reach out to Matei and Shoumik, and see if they
>>> > would like to collaborate. Wes, would you like to do that?
>>> >
>>> > Also, reach out to the Spark community. Are they aware of Arrow? Are they
>>> > planning to use it, or are they developing an alternative?
>>> >
>>> > Julian
>>> >
>>> >
>>> >
>>> > > On Dec 13, 2016, at 10:23 AM, Wes McKinney <we...@gmail.com>
>>> wrote:
>>> > >
>>> > > Looks like there will be a talk about Weld at Strata next Spring:
>>> > >
>>> > > http://conferences.oreilly.com/strata/strata-ca/public/
>>> > schedule/detail/57646
>>> > >
>>> > > It would be nice to see if there's any work that we can be doing in
>>> > > Arrow to try to collaborate on such closely-related efforts.
>>> > >
>>> > > - Wes
>>> > >
>>> > > On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com>
>>> > wrote:
>>> > >> I've always contended that building a sort of "runtime" for in-memory
>>> > >> column expressions was a natural next step after hardening Arrow as a
>>> > >> data structure and memory exchange mechanism. I'm hopeful that we'll
>>> > >> see some cross-pollination between projects like Weld and Arrow to
>>> > >> produce open source solutions that help drive more collaboration and
>>> > >> consolidation around shared low-level needs for building analytical
>>> > >> systems. We still have significant work to do to solidify Arrow as a
>>> > >> standard -- anyone all reading can do to help work against
>>> > >> fragmentation in data structures / memory representations would be
>>> > >> very much appreciated!
>>> > >>
>>> > >> - Wes
>>> > >>
>>> > >> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com>
>>> > wrote:
>>> > >>> I had that in my queue, and your watching and thoughts about it mean
>>> I
>>> > need
>>> > >>> to watch it today ;)
>>> > >>>
>>> > >>> Thanks for the link to where you've played with abstraction before.
>>> I'm
>>> > >>> going to fork a copy and do some playing myself. I'm very interested
>>> > in how
>>> > >>> universal it can be, and a library to go from expression to universal
>>> > >>> intermediate, them output the code in the language of one's choice.
>>> The
>>> > >>> optimizations could be performed in the intermediate layer, the
>>> output
>>> > >>> later, or both if there are specific code optimizations available
>>> after
>>> > >>> translation.
>>> > >>>
>>> > >>> -Donald
>>> > >>>
>>> > >>> On Nov 22, 2016 2:31 PM, Julien Le Dem
>>> > >>> wrote:
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Julien

Re: Weld

Posted by Wes McKinney <we...@gmail.com>.
Sounds great, thanks.

A number of people are looking at using Arrow for faster serialization
for PySpark, so now that the Java and C++ libraries are compatible
(cf: integration tests) we can make this a reality.

On Thu, Dec 15, 2016 at 4:54 PM, Julien Le Dem <ju...@dremio.com> wrote:
> I'm happy to reach out to Matei.
> Reynold is on this list and the Arrow PMC as well.
> Wes, I can start with an email and CC you.
>
>
>
> On Thu, Dec 15, 2016 at 11:03 AM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> I already made sure that Matei is aware of this thread.  He seemed
>> interested in talking with key Arrow developers.
>>
>> On Thu, Dec 15, 2016 at 10:49 AM, Julian Hyde <jh...@apache.org> wrote:
>>
>> > I think someone should reach out to Matei and Shoumik, and see if they
>> > would like to collaborate. Wes, would you like to do that?
>> >
>> > Also, reach out to the Spark community. Are they aware of Arrow? Are they
>> > planning to use it, or are they developing an alternative?
>> >
>> > Julian
>> >
>> >
>> >
>> > > On Dec 13, 2016, at 10:23 AM, Wes McKinney <we...@gmail.com>
>> wrote:
>> > >
>> > > Looks like there will be a talk about Weld at Strata next Spring:
>> > >
>> > > http://conferences.oreilly.com/strata/strata-ca/public/
>> > schedule/detail/57646
>> > >
>> > > It would be nice to see if there's any work that we can be doing in
>> > > Arrow to try to collaborate on such closely-related efforts.
>> > >
>> > > - Wes
>> > >
>> > > On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com>
>> > wrote:
>> > >> I've always contended that building a sort of "runtime" for in-memory
>> > >> column expressions was a natural next step after hardening Arrow as a
>> > >> data structure and memory exchange mechanism. I'm hopeful that we'll
>> > >> see some cross-pollination between projects like Weld and Arrow to
>> > >> produce open source solutions that help drive more collaboration and
>> > >> consolidation around shared low-level needs for building analytical
>> > >> systems. We still have significant work to do to solidify Arrow as a
>> > >> standard -- anyone all reading can do to help work against
>> > >> fragmentation in data structures / memory representations would be
>> > >> very much appreciated!
>> > >>
>> > >> - Wes
>> > >>
>> > >> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com>
>> > wrote:
>> > >>> I had that in my queue, and your watching and thoughts about it mean
>> I
>> > need
>> > >>> to watch it today ;)
>> > >>>
>> > >>> Thanks for the link to where you've played with abstraction before.
>> I'm
>> > >>> going to fork a copy and do some playing myself. I'm very interested
>> > in how
>> > >>> universal it can be, and a library to go from expression to universal
>> > >>> intermediate, them output the code in the language of one's choice.
>> The
>> > >>> optimizations could be performed in the intermediate layer, the
>> output
>> > >>> later, or both if there are specific code optimizations available
>> after
>> > >>> translation.
>> > >>>
>> > >>> -Donald
>> > >>>
>> > >>> On Nov 22, 2016 2:31 PM, Julien Le Dem
>> > >>> wrote:
>> >
>> >
>>
>
>
>
> --
> Julien

Re: Weld

Posted by Julien Le Dem <ju...@dremio.com>.
I'm happy to reach out to Matei.
Reynold is on this list and the Arrow PMC as well.
Wes, I can start with an email and CC you.



On Thu, Dec 15, 2016 at 11:03 AM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> I already made sure that Matei is aware of this thread.  He seemed
> interested in talking with key Arrow developers.
>
> On Thu, Dec 15, 2016 at 10:49 AM, Julian Hyde <jh...@apache.org> wrote:
>
> > I think someone should reach out to Matei and Shoumik, and see if they
> > would like to collaborate. Wes, would you like to do that?
> >
> > Also, reach out to the Spark community. Are they aware of Arrow? Are they
> > planning to use it, or are they developing an alternative?
> >
> > Julian
> >
> >
> >
> > > On Dec 13, 2016, at 10:23 AM, Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > Looks like there will be a talk about Weld at Strata next Spring:
> > >
> > > http://conferences.oreilly.com/strata/strata-ca/public/
> > schedule/detail/57646
> > >
> > > It would be nice to see if there's any work that we can be doing in
> > > Arrow to try to collaborate on such closely-related efforts.
> > >
> > > - Wes
> > >
> > > On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com>
> > wrote:
> > >> I've always contended that building a sort of "runtime" for in-memory
> > >> column expressions was a natural next step after hardening Arrow as a
> > >> data structure and memory exchange mechanism. I'm hopeful that we'll
> > >> see some cross-pollination between projects like Weld and Arrow to
> > >> produce open source solutions that help drive more collaboration and
> > >> consolidation around shared low-level needs for building analytical
> > >> systems. We still have significant work to do to solidify Arrow as a
> > >> standard -- anyone all reading can do to help work against
> > >> fragmentation in data structures / memory representations would be
> > >> very much appreciated!
> > >>
> > >> - Wes
> > >>
> > >> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com>
> > wrote:
> > >>> I had that in my queue, and your watching and thoughts about it mean
> I
> > need
> > >>> to watch it today ;)
> > >>>
> > >>> Thanks for the link to where you've played with abstraction before.
> I'm
> > >>> going to fork a copy and do some playing myself. I'm very interested
> > in how
> > >>> universal it can be, and a library to go from expression to universal
> > >>> intermediate, them output the code in the language of one's choice.
> The
> > >>> optimizations could be performed in the intermediate layer, the
> output
> > >>> later, or both if there are specific code optimizations available
> after
> > >>> translation.
> > >>>
> > >>> -Donald
> > >>>
> > >>> On Nov 22, 2016 2:31 PM, Julien Le Dem
> > >>> wrote:
> >
> >
>



-- 
Julien

Re: Weld

Posted by Mark Hamstra <ma...@clearstorydata.com>.
I already made sure that Matei is aware of this thread.  He seemed
interested in talking with key Arrow developers.

On Thu, Dec 15, 2016 at 10:49 AM, Julian Hyde <jh...@apache.org> wrote:

> I think someone should reach out to Matei and Shoumik, and see if they
> would like to collaborate. Wes, would you like to do that?
>
> Also, reach out to the Spark community. Are they aware of Arrow? Are they
> planning to use it, or are they developing an alternative?
>
> Julian
>
>
>
> > On Dec 13, 2016, at 10:23 AM, Wes McKinney <we...@gmail.com> wrote:
> >
> > Looks like there will be a talk about Weld at Strata next Spring:
> >
> > http://conferences.oreilly.com/strata/strata-ca/public/
> schedule/detail/57646
> >
> > It would be nice to see if there's any work that we can be doing in
> > Arrow to try to collaborate on such closely-related efforts.
> >
> > - Wes
> >
> > On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >> I've always contended that building a sort of "runtime" for in-memory
> >> column expressions was a natural next step after hardening Arrow as a
> >> data structure and memory exchange mechanism. I'm hopeful that we'll
> >> see some cross-pollination between projects like Weld and Arrow to
> >> produce open source solutions that help drive more collaboration and
> >> consolidation around shared low-level needs for building analytical
> >> systems. We still have significant work to do to solidify Arrow as a
> >> standard -- anyone all reading can do to help work against
> >> fragmentation in data structures / memory representations would be
> >> very much appreciated!
> >>
> >> - Wes
> >>
> >> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com>
> wrote:
> >>> I had that in my queue, and your watching and thoughts about it mean I
> need
> >>> to watch it today ;)
> >>>
> >>> Thanks for the link to where you've played with abstraction before. I'm
> >>> going to fork a copy and do some playing myself. I'm very interested
> in how
> >>> universal it can be, and a library to go from expression to universal
> >>> intermediate, them output the code in the language of one's choice. The
> >>> optimizations could be performed in the intermediate layer, the output
> >>> later, or both if there are specific code optimizations available after
> >>> translation.
> >>>
> >>> -Donald
> >>>
> >>> On Nov 22, 2016 2:31 PM, Julien Le Dem
> >>> wrote:
>
>

Re: Weld

Posted by Holden Karau <ho...@pigscanfly.ca>.
The PySpark community is aware of arrow, but certainly more reaching out to
the Spark SQL devs could really be beneficial to get us all on the same
page :)

On Thu, Dec 15, 2016 at 10:49 AM Julian Hyde <jh...@apache.org> wrote:

> I think someone should reach out to Matei and Shoumik, and see if they
> would like to collaborate. Wes, would you like to do that?
>
>
>
> Also, reach out to the Spark community. Are they aware of Arrow? Are they
> planning to use it, or are they developing an alternative?
>
>
>
> Julian
>
>
>
>
>
>
>
> > On Dec 13, 2016, at 10:23 AM, Wes McKinney <we...@gmail.com> wrote:
>
> >
>
> > Looks like there will be a talk about Weld at Strata next Spring:
>
> >
>
> >
> http://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/57646
>
> >
>
> > It would be nice to see if there's any work that we can be doing in
>
> > Arrow to try to collaborate on such closely-related efforts.
>
> >
>
> > - Wes
>
> >
>
> > On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com>
> wrote:
>
> >> I've always contended that building a sort of "runtime" for in-memory
>
> >> column expressions was a natural next step after hardening Arrow as a
>
> >> data structure and memory exchange mechanism. I'm hopeful that we'll
>
> >> see some cross-pollination between projects like Weld and Arrow to
>
> >> produce open source solutions that help drive more collaboration and
>
> >> consolidation around shared low-level needs for building analytical
>
> >> systems. We still have significant work to do to solidify Arrow as a
>
> >> standard -- anyone all reading can do to help work against
>
> >> fragmentation in data structures / memory representations would be
>
> >> very much appreciated!
>
> >>
>
> >> - Wes
>
> >>
>
> >> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com>
> wrote:
>
> >>> I had that in my queue, and your watching and thoughts about it mean I
> need
>
> >>> to watch it today ;)
>
> >>>
>
> >>> Thanks for the link to where you've played with abstraction before. I'm
>
> >>> going to fork a copy and do some playing myself. I'm very interested
> in how
>
> >>> universal it can be, and a library to go from expression to universal
>
> >>> intermediate, them output the code in the language of one's choice. The
>
> >>> optimizations could be performed in the intermediate layer, the output
>
> >>> later, or both if there are specific code optimizations available after
>
> >>> translation.
>
> >>>
>
> >>> -Donald
>
> >>>
>
> >>> On Nov 22, 2016 2:31 PM, Julien Le Dem
>
> >>> wrote:
>
>
>
>

Re: Weld

Posted by Julian Hyde <jh...@apache.org>.
I think someone should reach out to Matei and Shoumik, and see if they would like to collaborate. Wes, would you like to do that?

Also, reach out to the Spark community. Are they aware of Arrow? Are they planning to use it, or are they developing an alternative?

Julian



> On Dec 13, 2016, at 10:23 AM, Wes McKinney <we...@gmail.com> wrote:
> 
> Looks like there will be a talk about Weld at Strata next Spring:
> 
> http://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/57646
> 
> It would be nice to see if there's any work that we can be doing in
> Arrow to try to collaborate on such closely-related efforts.
> 
> - Wes
> 
> On Sat, Nov 26, 2016 at 3:53 PM, Wes McKinney <we...@gmail.com> wrote:
>> I've always contended that building a sort of "runtime" for in-memory
>> column expressions was a natural next step after hardening Arrow as a
>> data structure and memory exchange mechanism. I'm hopeful that we'll
>> see some cross-pollination between projects like Weld and Arrow to
>> produce open source solutions that help drive more collaboration and
>> consolidation around shared low-level needs for building analytical
>> systems. We still have significant work to do to solidify Arrow as a
>> standard -- anyone all reading can do to help work against
>> fragmentation in data structures / memory representations would be
>> very much appreciated!
>> 
>> - Wes
>> 
>> On Wed, Nov 23, 2016 at 11:35 AM, Donald Foss <do...@gmail.com> wrote:
>>> I had that in my queue, and your watching and thoughts about it mean I need
>>> to watch it today ;)
>>> 
>>> Thanks for the link to where you've played with abstraction before. I'm
>>> going to fork a copy and do some playing myself. I'm very interested in how
>>> universal it can be, and a library to go from expression to universal
>>> intermediate, them output the code in the language of one's choice. The
>>> optimizations could be performed in the intermediate layer, the output
>>> later, or both if there are specific code optimizations available after
>>> translation.
>>> 
>>> -Donald
>>> 
>>> On Nov 22, 2016 2:31 PM, Julien Le Dem
>>> wrote: