You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Wes McKinney <we...@gmail.com> on 2017/08/26 20:11:55 UTC

Apache Arrow at JupyterCon

hi all,

In case folks here are interested, I gave a keynote this week at
JupyterCon explaining my motivations for being involved in Apache
Arrow and how I see it fitting in with the data science ecosystem long
term:

https://www.youtube.com/watch?v=wdmf1msbtVs

I also gave an interview going a little deeper into some of the topics
from the talk:

https://www.youtube.com/watch?v=Q7y9l-L8yiU

I believe we have an exciting journey ahead of us, but it's certainly
going to take a lot of collaboration and community development.

- Wes

Re: Apache Arrow at JupyterCon

Posted by "Gang(Gary) Wang" <ga...@apache.org>.

Yes, the performance is critical for most of the big data applications that
is one of key success factors for both of Arrow and Mnemonic. a
performance-oriented engineer might even against fundamental design
patterns for performance. so the problem is how can we make their lives
easier? from my point of view, we would allow the developer to freely
choose the implementation they want to use like IoC/DI, services, e.g.
Mnemonic memory services (nvml, pmalloc, sysmem, pure...), another one
could be where the user computation will happen, I think both of Arrow and
Mnemonic are also able to support C/C++ code bindings for their data. but
they are using different approaches to achieve that. for Mnemonic, it
allows the user to push down their C code as durable computing service. for
example, a developer is able to create a computing service to sort the
native list created by Java side, this way we can get a significant
performance improvement if a performance-oriented developer is willing to
implement their algorithms in c code for performance. Arrow can definitely
take advantage of this capability that Mnemonic provided if the ArrowBuf
backed by Mnemonic. that's why I think this integration might add some
values to Apache Arrow.




On Thu, Sep 7, 2017 at 7:53 AM, Jacques Nadeau <ja...@apache.org> wrote:

> Our general goal (which hasn't always been succesfully implemented) is what
> I'd describe as "fractured subclassing". You can see our use of this where
> ArrowBuf may extend various Netty classes but is interacting directly with
> memory addresses for all the hot path get/set operations (not delegating
> through the various types of hierarchy) [1] while still using delegation
> for cooler paths such as [2]. I'd like to try that approach before adding
> the multiple implementations you propose.
>
> Note, value vector accessors fail to currently use a fractured subclassing
> approach, causing the performance penalty that others have commented on
> with regards to ARROW-1463 [3]
>
> [1] https://github.com/apache/arrow/blob/master/java/memory/
> src/main/java/io/netty/buffer/ArrowBuf.java#L580
> [2] https://github.com/apache/arrow/blob/master/java/memory/
> src/main/java/io/netty/buffer/ArrowBuf.java#L399
> [3]
> https://issues.apache.org/jira/browse/ARROW-1463?
> focusedCommentId=16154874&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-16154874
>
>
> On Thu, Sep 7, 2017 at 12:33 AM, Gonzalo Ortiz Jaureguizar <
> golthiryus@gmail.com> wrote:
>
> > On a library like Arrow it also is very important to have the less
> dynamic
> > methods call on the critical paths (get/puts). If it is decided to
> supports
> > other memory systems, it is important to try to minimize that as much as
> > possible. If there is a single vector class that supports both systems
> (by
> > calling an interface, for example), the JVM will try to optimize the
> > dynamic calls by heuristic. If, on a JVM, only one implementation of both
> > is used (lets say a program only uses the Netty implementation), then the
> > impact should be negligible. Contrary if more than one is used, then
> there
> > is going to be problems.
> >
> > Ideally we would like to have an abstract vector that doesn't know about
> > the memory buffer and then N implementations with specific methods to
> talk
> > with the buffer. And that should be repeated for each vector type. For
> > example, if there is a IntVector extended by NettyIntVector and a
> > MnemonicIntVector and NullableIntVector delegates on IntVector, there
> > should be a NettyNullableIntVector that delegates on NettyIntVector and
> the
> > same for a MnemonicNullableIntVector. This may sound cumbersome, but by
> > doing that, clients that really care about performance can use the
> specific
> > class on their code to be sure that methods calls are not dynamic.
> >
> > 2017-09-07 6:11 GMT+02:00 Jacques Nadeau <ja...@apache.org>:
> >
> > > This is a interesting problem but also pretty complex. Arrow's Java
> > memory
> > > management model is complex on purpose (see
> > > https://github.com/apache/arrow/blob/master/java/memory/
> > > src/main/java/org/apache/arrow/memory/README.md
> > > for more info). It is designed to reserve and share memory in multiple
> > > hierarchical domains (with reservations and limits) while providing
> > > transfer semantics across those domains with minimal contention and
> > > locking. An opaque (and potentially easy starting point would be to
> > > optionally allow AllocationManager to use something other than the
> > > PooledByteBufAllocatorL and UnsafeDirectLittleEndian for memory
> > allocation.
> > > This wouldn't expose movement between different memory tiers but that
> > could
> > > be managed underneath the Arrow system. At the end of the day, the
> whole
> > > hierarchy is basically a collection of memory addresses, accounting and
> > > reference counting.
> > >
> > > A phase two could be a proposal which allows movement between memory
> > > domains and could be generified across systems like Mnemonic as well
> > > GPU/Device memory domains.
> > >
> > >
> > > On Wed, Sep 6, 2017 at 4:45 PM, Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > > > Thanks Gary, that is helpful context. In light if this, it might be
> > > > worth writing some kind of a proposal for how to enable the Java
> > > > vector classes to be backed by some other kind of byte buffers. It
> > > > might be that an alternative version of portions of the Arrow Java
> > > > library (i.e. decoupled from Netty) might need to be created.
> > > >
> > > > If it cannot be reconciled with the Netty AbstractByteBuf class then
> > > > this would be useful to know so that Arrow developers can plan
> > > > accordingly for the future.
> > > >
> > > > On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <qi...@gmail.com> wrote:
> > > > > The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf
> > is
> > > > > defined in the Netty library, it does more like a memory pool not a
> > > pure
> > > > > buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
> > > > >
> > > > > I have ever tried to make ArrowBuf build on top of DurableBuffer of
> > > > > Mnemonic, but looks it is not very easy to decouple the refcount
> from
> > > > other
> > > > > parts because the lifecycle of DurableBuffer could also be managed
> by
> > > > > JVM automatically instead of using refcount.
> > > > >
> > > > > I still want to figure out how gracefully to migrate the backend of
> > > > > ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could
> > bring
> > > > > other benefits for Arrow e.g. persistent on any kind of memory
> > service
> > > > that
> > > > > could make use of SSD, NVMe, Memory and NAS and more. in this way,
> > > Arrow
> > > > is
> > > > > able to break through the capacity limitation of system memory,
> avoid
> > > the
> > > > > SerDe for storage and link other durable objects with ease and etc.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <wesmckinn@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >> It should be possible to have an ArrowBuf backed by a
> > > > >> MappedByteBuffer. Anyone reading is welcome to dig in and write a
> > > > >> patch for this.
> > > > >>
> > > > >> Semantically this is what we have done in C++ -- a memory map
> > inherits
> > > > >> from arrow::Buffer, so we can slice and dice a memory map as we
> > would
> > > > >> any other Buffer object
> > > > >>
> > > > >> https://github.com/apache/arrow/blob/master/cpp/src/
> > > > arrow/io/file.cc#L501
> > > > >>
> > > > >> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
> > > > >> <go...@gmail.com> wrote:
> > > > >> > This is a very interesting feature. It's very surprising that
> > there
> > > > is no
> > > > >> > ByteBuffer implementation backed on a MappedByteBuffer. As far
> as
> > I
> > > > >> > understand, it should be trivial to implement (maybe not to
> pool)
> > as
> > > > >> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer
> > > extends
> > > > >> > that. But I didn't find implementations when I goggled for it.
> > > > >> >
> > > > >> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
> > > > >> >
> > > > >> >> I think ideally we would have a Java interface that would
> support
> > > all
> > > > >> of:
> > > > >> >>
> > > > >> >> - Memory mapped files
> > > > >> >> - Anonymous shared memory segments (e.g. POSIX shm)
> > > > >> >> - NVM / Mnemonic
> > > > >> >>
> > > > >> >> We already have the ability to do zero-copy reads from
> > buffer-like
> > > > >> >> objects in C++ and IO interfaces that support zero copy (like
> > > memory
> > > > >> >> mapped files). We can do zero-copy reads from ArrowBuf in Java
> > but
> > > we
> > > > >> >> are missing the interfaces to shared memory sources
> > > > >> >>
> > > > >> >> - Wes
> > > > >> >>
> > > > >> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <
> > garyw@apache.org
> > > >
> > > > >> wrote:
> > > > >> >> > Hi Wes,
> > > > >> >> >
> > > > >> >> > Thank you for the explanation. the usage of
> > > > >> >> > https://issues.apache.org/jira/browse/ARROW-721 could be
> > > directly
> > > > >> >> supported
> > > > >> >> > by Mnemonic through DurableBuffer and DurableChunk, the
> > > > DurableChunk
> > > > >> >> makes
> > > > >> >> > use of unsafe to expose a plain memory space for Arrow to use
> > > > without
> > > > >> >> > performance penalties. that's why most of the big data
> > frameworks
> > > > take
> > > > >> >> the
> > > > >> >> > advantage of unsafe, please refer to
> > > > >> >> > https://mnemonic.apache.org/docs/domusecases.html for the
> use
> > > > cases.
> > > > >> we
> > > > >> >> > could work on this ticket if you think that's exactly what
> you
> > > > want.
> > > > >> >> >
> > > > >> >> > Regarding the NVM tech., that is what Mnemonic created for.
> it
> > > > could
> > > > >> be
> > > > >> >> > used to directly persist Java generic objects and collection
> on
> > > NVM
> > > > >> with
> > > > >> >> no
> > > > >> >> > SerDe. so what kind of basic tools you mentioned? probably,
> we
> > > can
> > > > >> help
> > > > >> >> > also identify the gaps for Mnemonic as well. Thanks!
> > > > >> >> >
> > > > >> >> > Very truly yours,
> > > > >> >> > Gary
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >
> > > > >> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <
> > > > wesmckinn@gmail.com>
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> >> hi Gary,
> > > > >> >> >>
> > > > >> >> >> The Java libraries are not yet capable of writing or
> zero-copy
> > > > reads
> > > > >> >> >> of Arrow datasets to/from shared memory or memory-mapped
> > files:
> > > > >> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've
> > > developed
> > > > >> quite
> > > > >> >> >> a bit of technology on the C++ side for dealing with shared
> > > memory
> > > > >> IPC
> > > > >> >> >> but we need someone to help with that on the Java side.
> > > > >> >> >>
> > > > >> >> >> In the context of NVM technologies, it would be nice to be
> > able
> > > to
> > > > >> >> >> persist a dataset to NVM and continue to do analytics on it,
> > > while
> > > > >> >> >> retaining a "handle" so that the dataset can be easily
> > recovered
> > > > in
> > > > >> >> >> the event of process failure. We may arrive at new use cases
> > > once
> > > > >> some
> > > > >> >> >> of the basic tools exist.
> > > > >> >> >>
> > > > >> >> >> - Wes
> > > > >> >> >>
> > > > >> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <
> > > > garyw@apache.org>
> > > > >> >> wrote:
> > > > >> >> >> > Thank you for sharing the videos. We are very interested
> in
> > > how
> > > > to
> > > > >> >> >> support
> > > > >> >> >> > Arrow data format and collection very closely, could you
> > > please
> > > > >> help
> > > > >> >> to
> > > > >> >> >> > point out which interfaces to allow Mnemonic act as a
> memory
> > > > >> provider
> > > > >> >> for
> > > > >> >> >> > the user to store and access Arrow managed datasets ?
> > Thanks!
> > > > >> >> >> >
> > > > >> >> >> > Very truly yours,
> > > > >> >> >> > Gary.
> > > > >> >> >> >
> > > > >> >> >> >
> > > > >> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
> > > > >> ivan.sadikov@gmail.com
> > > > >> >> >
> > > > >> >> >> > wrote:
> > > > >> >> >> >
> > > > >> >> >> >> Great presentation! Thank you for sharing.
> > > > >> >> >> >>
> > > > >> >> >> >>
> > > > >> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <
> > > > wesmckinn@gmail.com
> > > > >> >
> > > > >> >> >> wrote:
> > > > >> >> >> >>
> > > > >> >> >> >> > Absolutely. I will do that now
> > > > >> >> >> >> >
> > > > >> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <
> > > > jhyde@apache.org>
> > > > >> >> >> wrote:
> > > > >> >> >> >> > > Thanks for sharing. Can we tweet those videos as
> well?
> > I
> > > > see
> > > > >> that
> > > > >> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
> > > > >> apachearrow>
> > > > >> >> >> only
> > > > >> >> >> >> > tweeted your slides.
> > > > >> >> >> >> > >
> > > > >> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
> > > > >> wesmckinn@gmail.com>
> > > > >> >> >> >> wrote:
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> hi all,
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> In case folks here are interested, I gave a keynote
> > this
> > > > >> week at
> > > > >> >> >> >> > >> JupyterCon explaining my motivations for being
> > involved
> > > in
> > > > >> >> Apache
> > > > >> >> >> >> > >> Arrow and how I see it fitting in with the data
> > science
> > > > >> >> ecosystem
> > > > >> >> >> long
> > > > >> >> >> >> > >> term:
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> I also gave an interview going a little deeper into
> > some
> > > > of
> > > > >> the
> > > > >> >> >> topics
> > > > >> >> >> >> > >> from the talk:
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> I believe we have an exciting journey ahead of us,
> but
> > > > it's
> > > > >> >> >> certainly
> > > > >> >> >> >> > >> going to take a lot of collaboration and community
> > > > >> development.
> > > > >> >> >> >> > >>
> > > > >> >> >> >> > >> - Wes
> > > > >> >> >> >> > >
> > > > >> >> >> >> >
> > > > >> >> >> >>
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
>

Re: Apache Arrow at JupyterCon

Posted by Jacques Nadeau <ja...@apache.org>.

Our general goal (which hasn't always been succesfully implemented) is what
I'd describe as "fractured subclassing". You can see our use of this where
ArrowBuf may extend various Netty classes but is interacting directly with
memory addresses for all the hot path get/set operations (not delegating
through the various types of hierarchy) [1] while still using delegation
for cooler paths such as [2]. I'd like to try that approach before adding
the multiple implementations you propose.

Note, value vector accessors fail to currently use a fractured subclassing
approach, causing the performance penalty that others have commented on
with regards to ARROW-1463 [3]

[1] https://github.com/apache/arrow/blob/master/java/memory/
src/main/java/io/netty/buffer/ArrowBuf.java#L580
[2] https://github.com/apache/arrow/blob/master/java/memory/
src/main/java/io/netty/buffer/ArrowBuf.java#L399
[3]
https://issues.apache.org/jira/browse/ARROW-1463?focusedCommentId=16154874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16154874


On Thu, Sep 7, 2017 at 12:33 AM, Gonzalo Ortiz Jaureguizar <
golthiryus@gmail.com> wrote:

> On a library like Arrow it also is very important to have the less dynamic
> methods call on the critical paths (get/puts). If it is decided to supports
> other memory systems, it is important to try to minimize that as much as
> possible. If there is a single vector class that supports both systems (by
> calling an interface, for example), the JVM will try to optimize the
> dynamic calls by heuristic. If, on a JVM, only one implementation of both
> is used (lets say a program only uses the Netty implementation), then the
> impact should be negligible. Contrary if more than one is used, then there
> is going to be problems.
>
> Ideally we would like to have an abstract vector that doesn't know about
> the memory buffer and then N implementations with specific methods to talk
> with the buffer. And that should be repeated for each vector type. For
> example, if there is a IntVector extended by NettyIntVector and a
> MnemonicIntVector and NullableIntVector delegates on IntVector, there
> should be a NettyNullableIntVector that delegates on NettyIntVector and the
> same for a MnemonicNullableIntVector. This may sound cumbersome, but by
> doing that, clients that really care about performance can use the specific
> class on their code to be sure that methods calls are not dynamic.
>
> 2017-09-07 6:11 GMT+02:00 Jacques Nadeau <ja...@apache.org>:
>
> > This is a interesting problem but also pretty complex. Arrow's Java
> memory
> > management model is complex on purpose (see
> > https://github.com/apache/arrow/blob/master/java/memory/
> > src/main/java/org/apache/arrow/memory/README.md
> > for more info). It is designed to reserve and share memory in multiple
> > hierarchical domains (with reservations and limits) while providing
> > transfer semantics across those domains with minimal contention and
> > locking. An opaque (and potentially easy starting point would be to
> > optionally allow AllocationManager to use something other than the
> > PooledByteBufAllocatorL and UnsafeDirectLittleEndian for memory
> allocation.
> > This wouldn't expose movement between different memory tiers but that
> could
> > be managed underneath the Arrow system. At the end of the day, the whole
> > hierarchy is basically a collection of memory addresses, accounting and
> > reference counting.
> >
> > A phase two could be a proposal which allows movement between memory
> > domains and could be generified across systems like Mnemonic as well
> > GPU/Device memory domains.
> >
> >
> > On Wed, Sep 6, 2017 at 4:45 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > Thanks Gary, that is helpful context. In light if this, it might be
> > > worth writing some kind of a proposal for how to enable the Java
> > > vector classes to be backed by some other kind of byte buffers. It
> > > might be that an alternative version of portions of the Arrow Java
> > > library (i.e. decoupled from Netty) might need to be created.
> > >
> > > If it cannot be reconciled with the Netty AbstractByteBuf class then
> > > this would be useful to know so that Arrow developers can plan
> > > accordingly for the future.
> > >
> > > On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <qi...@gmail.com> wrote:
> > > > The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf
> is
> > > > defined in the Netty library, it does more like a memory pool not a
> > pure
> > > > buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
> > > >
> > > > I have ever tried to make ArrowBuf build on top of DurableBuffer of
> > > > Mnemonic, but looks it is not very easy to decouple the refcount from
> > > other
> > > > parts because the lifecycle of DurableBuffer could also be managed by
> > > > JVM automatically instead of using refcount.
> > > >
> > > > I still want to figure out how gracefully to migrate the backend of
> > > > ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could
> bring
> > > > other benefits for Arrow e.g. persistent on any kind of memory
> service
> > > that
> > > > could make use of SSD, NVMe, Memory and NAS and more. in this way,
> > Arrow
> > > is
> > > > able to break through the capacity limitation of system memory, avoid
> > the
> > > > SerDe for storage and link other durable objects with ease and etc.
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <we...@gmail.com>
> > > wrote:
> > > >
> > > >> It should be possible to have an ArrowBuf backed by a
> > > >> MappedByteBuffer. Anyone reading is welcome to dig in and write a
> > > >> patch for this.
> > > >>
> > > >> Semantically this is what we have done in C++ -- a memory map
> inherits
> > > >> from arrow::Buffer, so we can slice and dice a memory map as we
> would
> > > >> any other Buffer object
> > > >>
> > > >> https://github.com/apache/arrow/blob/master/cpp/src/
> > > arrow/io/file.cc#L501
> > > >>
> > > >> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
> > > >> <go...@gmail.com> wrote:
> > > >> > This is a very interesting feature. It's very surprising that
> there
> > > is no
> > > >> > ByteBuffer implementation backed on a MappedByteBuffer. As far as
> I
> > > >> > understand, it should be trivial to implement (maybe not to pool)
> as
> > > >> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer
> > extends
> > > >> > that. But I didn't find implementations when I goggled for it.
> > > >> >
> > > >> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
> > > >> >
> > > >> >> I think ideally we would have a Java interface that would support
> > all
> > > >> of:
> > > >> >>
> > > >> >> - Memory mapped files
> > > >> >> - Anonymous shared memory segments (e.g. POSIX shm)
> > > >> >> - NVM / Mnemonic
> > > >> >>
> > > >> >> We already have the ability to do zero-copy reads from
> buffer-like
> > > >> >> objects in C++ and IO interfaces that support zero copy (like
> > memory
> > > >> >> mapped files). We can do zero-copy reads from ArrowBuf in Java
> but
> > we
> > > >> >> are missing the interfaces to shared memory sources
> > > >> >>
> > > >> >> - Wes
> > > >> >>
> > > >> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <
> garyw@apache.org
> > >
> > > >> wrote:
> > > >> >> > Hi Wes,
> > > >> >> >
> > > >> >> > Thank you for the explanation. the usage of
> > > >> >> > https://issues.apache.org/jira/browse/ARROW-721 could be
> > directly
> > > >> >> supported
> > > >> >> > by Mnemonic through DurableBuffer and DurableChunk, the
> > > DurableChunk
> > > >> >> makes
> > > >> >> > use of unsafe to expose a plain memory space for Arrow to use
> > > without
> > > >> >> > performance penalties. that's why most of the big data
> frameworks
> > > take
> > > >> >> the
> > > >> >> > advantage of unsafe, please refer to
> > > >> >> > https://mnemonic.apache.org/docs/domusecases.html for the use
> > > cases.
> > > >> we
> > > >> >> > could work on this ticket if you think that's exactly what you
> > > want.
> > > >> >> >
> > > >> >> > Regarding the NVM tech., that is what Mnemonic created for. it
> > > could
> > > >> be
> > > >> >> > used to directly persist Java generic objects and collection on
> > NVM
> > > >> with
> > > >> >> no
> > > >> >> > SerDe. so what kind of basic tools you mentioned? probably,  we
> > can
> > > >> help
> > > >> >> > also identify the gaps for Mnemonic as well. Thanks!
> > > >> >> >
> > > >> >> > Very truly yours,
> > > >> >> > Gary
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <
> > > wesmckinn@gmail.com>
> > > >> >> wrote:
> > > >> >> >
> > > >> >> >> hi Gary,
> > > >> >> >>
> > > >> >> >> The Java libraries are not yet capable of writing or zero-copy
> > > reads
> > > >> >> >> of Arrow datasets to/from shared memory or memory-mapped
> files:
> > > >> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've
> > developed
> > > >> quite
> > > >> >> >> a bit of technology on the C++ side for dealing with shared
> > memory
> > > >> IPC
> > > >> >> >> but we need someone to help with that on the Java side.
> > > >> >> >>
> > > >> >> >> In the context of NVM technologies, it would be nice to be
> able
> > to
> > > >> >> >> persist a dataset to NVM and continue to do analytics on it,
> > while
> > > >> >> >> retaining a "handle" so that the dataset can be easily
> recovered
> > > in
> > > >> >> >> the event of process failure. We may arrive at new use cases
> > once
> > > >> some
> > > >> >> >> of the basic tools exist.
> > > >> >> >>
> > > >> >> >> - Wes
> > > >> >> >>
> > > >> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <
> > > garyw@apache.org>
> > > >> >> wrote:
> > > >> >> >> > Thank you for sharing the videos. We are very interested in
> > how
> > > to
> > > >> >> >> support
> > > >> >> >> > Arrow data format and collection very closely, could you
> > please
> > > >> help
> > > >> >> to
> > > >> >> >> > point out which interfaces to allow Mnemonic act as a memory
> > > >> provider
> > > >> >> for
> > > >> >> >> > the user to store and access Arrow managed datasets ?
> Thanks!
> > > >> >> >> >
> > > >> >> >> > Very truly yours,
> > > >> >> >> > Gary.
> > > >> >> >> >
> > > >> >> >> >
> > > >> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
> > > >> ivan.sadikov@gmail.com
> > > >> >> >
> > > >> >> >> > wrote:
> > > >> >> >> >
> > > >> >> >> >> Great presentation! Thank you for sharing.
> > > >> >> >> >>
> > > >> >> >> >>
> > > >> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <
> > > wesmckinn@gmail.com
> > > >> >
> > > >> >> >> wrote:
> > > >> >> >> >>
> > > >> >> >> >> > Absolutely. I will do that now
> > > >> >> >> >> >
> > > >> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <
> > > jhyde@apache.org>
> > > >> >> >> wrote:
> > > >> >> >> >> > > Thanks for sharing. Can we tweet those videos as well?
> I
> > > see
> > > >> that
> > > >> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
> > > >> apachearrow>
> > > >> >> >> only
> > > >> >> >> >> > tweeted your slides.
> > > >> >> >> >> > >
> > > >> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
> > > >> wesmckinn@gmail.com>
> > > >> >> >> >> wrote:
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> hi all,
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> In case folks here are interested, I gave a keynote
> this
> > > >> week at
> > > >> >> >> >> > >> JupyterCon explaining my motivations for being
> involved
> > in
> > > >> >> Apache
> > > >> >> >> >> > >> Arrow and how I see it fitting in with the data
> science
> > > >> >> ecosystem
> > > >> >> >> long
> > > >> >> >> >> > >> term:
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> I also gave an interview going a little deeper into
> some
> > > of
> > > >> the
> > > >> >> >> topics
> > > >> >> >> >> > >> from the talk:
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> I believe we have an exciting journey ahead of us, but
> > > it's
> > > >> >> >> certainly
> > > >> >> >> >> > >> going to take a lot of collaboration and community
> > > >> development.
> > > >> >> >> >> > >>
> > > >> >> >> >> > >> - Wes
> > > >> >> >> >> > >
> > > >> >> >> >> >
> > > >> >> >> >>
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
>

Re: Apache Arrow at JupyterCon

Posted by Gonzalo Ortiz Jaureguizar <go...@gmail.com>.

On a library like Arrow it also is very important to have the less dynamic
methods call on the critical paths (get/puts). If it is decided to supports
other memory systems, it is important to try to minimize that as much as
possible. If there is a single vector class that supports both systems (by
calling an interface, for example), the JVM will try to optimize the
dynamic calls by heuristic. If, on a JVM, only one implementation of both
is used (lets say a program only uses the Netty implementation), then the
impact should be negligible. Contrary if more than one is used, then there
is going to be problems.

Ideally we would like to have an abstract vector that doesn't know about
the memory buffer and then N implementations with specific methods to talk
with the buffer. And that should be repeated for each vector type. For
example, if there is a IntVector extended by NettyIntVector and a
MnemonicIntVector and NullableIntVector delegates on IntVector, there
should be a NettyNullableIntVector that delegates on NettyIntVector and the
same for a MnemonicNullableIntVector. This may sound cumbersome, but by
doing that, clients that really care about performance can use the specific
class on their code to be sure that methods calls are not dynamic.

2017-09-07 6:11 GMT+02:00 Jacques Nadeau <ja...@apache.org>:

> This is a interesting problem but also pretty complex. Arrow's Java memory
> management model is complex on purpose (see
> https://github.com/apache/arrow/blob/master/java/memory/
> src/main/java/org/apache/arrow/memory/README.md
> for more info). It is designed to reserve and share memory in multiple
> hierarchical domains (with reservations and limits) while providing
> transfer semantics across those domains with minimal contention and
> locking. An opaque (and potentially easy starting point would be to
> optionally allow AllocationManager to use something other than the
> PooledByteBufAllocatorL and UnsafeDirectLittleEndian for memory allocation.
> This wouldn't expose movement between different memory tiers but that could
> be managed underneath the Arrow system. At the end of the day, the whole
> hierarchy is basically a collection of memory addresses, accounting and
> reference counting.
>
> A phase two could be a proposal which allows movement between memory
> domains and could be generified across systems like Mnemonic as well
> GPU/Device memory domains.
>
>
> On Wed, Sep 6, 2017 at 4:45 PM, Wes McKinney <we...@gmail.com> wrote:
>
> > Thanks Gary, that is helpful context. In light if this, it might be
> > worth writing some kind of a proposal for how to enable the Java
> > vector classes to be backed by some other kind of byte buffers. It
> > might be that an alternative version of portions of the Arrow Java
> > library (i.e. decoupled from Netty) might need to be created.
> >
> > If it cannot be reconciled with the Netty AbstractByteBuf class then
> > this would be useful to know so that Arrow developers can plan
> > accordingly for the future.
> >
> > On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <qi...@gmail.com> wrote:
> > > The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is
> > > defined in the Netty library, it does more like a memory pool not a
> pure
> > > buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
> > >
> > > I have ever tried to make ArrowBuf build on top of DurableBuffer of
> > > Mnemonic, but looks it is not very easy to decouple the refcount from
> > other
> > > parts because the lifecycle of DurableBuffer could also be managed by
> > > JVM automatically instead of using refcount.
> > >
> > > I still want to figure out how gracefully to migrate the backend of
> > > ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could bring
> > > other benefits for Arrow e.g. persistent on any kind of memory service
> > that
> > > could make use of SSD, NVMe, Memory and NAS and more. in this way,
> Arrow
> > is
> > > able to break through the capacity limitation of system memory, avoid
> the
> > > SerDe for storage and link other durable objects with ease and etc.
> > >
> > >
> > >
> > >
> > > On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > >> It should be possible to have an ArrowBuf backed by a
> > >> MappedByteBuffer. Anyone reading is welcome to dig in and write a
> > >> patch for this.
> > >>
> > >> Semantically this is what we have done in C++ -- a memory map inherits
> > >> from arrow::Buffer, so we can slice and dice a memory map as we would
> > >> any other Buffer object
> > >>
> > >> https://github.com/apache/arrow/blob/master/cpp/src/
> > arrow/io/file.cc#L501
> > >>
> > >> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
> > >> <go...@gmail.com> wrote:
> > >> > This is a very interesting feature. It's very surprising that there
> > is no
> > >> > ByteBuffer implementation backed on a MappedByteBuffer. As far as I
> > >> > understand, it should be trivial to implement (maybe not to pool) as
> > >> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer
> extends
> > >> > that. But I didn't find implementations when I goggled for it.
> > >> >
> > >> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
> > >> >
> > >> >> I think ideally we would have a Java interface that would support
> all
> > >> of:
> > >> >>
> > >> >> - Memory mapped files
> > >> >> - Anonymous shared memory segments (e.g. POSIX shm)
> > >> >> - NVM / Mnemonic
> > >> >>
> > >> >> We already have the ability to do zero-copy reads from buffer-like
> > >> >> objects in C++ and IO interfaces that support zero copy (like
> memory
> > >> >> mapped files). We can do zero-copy reads from ArrowBuf in Java but
> we
> > >> >> are missing the interfaces to shared memory sources
> > >> >>
> > >> >> - Wes
> > >> >>
> > >> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <garyw@apache.org
> >
> > >> wrote:
> > >> >> > Hi Wes,
> > >> >> >
> > >> >> > Thank you for the explanation. the usage of
> > >> >> > https://issues.apache.org/jira/browse/ARROW-721 could be
> directly
> > >> >> supported
> > >> >> > by Mnemonic through DurableBuffer and DurableChunk, the
> > DurableChunk
> > >> >> makes
> > >> >> > use of unsafe to expose a plain memory space for Arrow to use
> > without
> > >> >> > performance penalties. that's why most of the big data frameworks
> > take
> > >> >> the
> > >> >> > advantage of unsafe, please refer to
> > >> >> > https://mnemonic.apache.org/docs/domusecases.html for the use
> > cases.
> > >> we
> > >> >> > could work on this ticket if you think that's exactly what you
> > want.
> > >> >> >
> > >> >> > Regarding the NVM tech., that is what Mnemonic created for. it
> > could
> > >> be
> > >> >> > used to directly persist Java generic objects and collection on
> NVM
> > >> with
> > >> >> no
> > >> >> > SerDe. so what kind of basic tools you mentioned? probably,  we
> can
> > >> help
> > >> >> > also identify the gaps for Mnemonic as well. Thanks!
> > >> >> >
> > >> >> > Very truly yours,
> > >> >> > Gary
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <
> > wesmckinn@gmail.com>
> > >> >> wrote:
> > >> >> >
> > >> >> >> hi Gary,
> > >> >> >>
> > >> >> >> The Java libraries are not yet capable of writing or zero-copy
> > reads
> > >> >> >> of Arrow datasets to/from shared memory or memory-mapped files:
> > >> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've
> developed
> > >> quite
> > >> >> >> a bit of technology on the C++ side for dealing with shared
> memory
> > >> IPC
> > >> >> >> but we need someone to help with that on the Java side.
> > >> >> >>
> > >> >> >> In the context of NVM technologies, it would be nice to be able
> to
> > >> >> >> persist a dataset to NVM and continue to do analytics on it,
> while
> > >> >> >> retaining a "handle" so that the dataset can be easily recovered
> > in
> > >> >> >> the event of process failure. We may arrive at new use cases
> once
> > >> some
> > >> >> >> of the basic tools exist.
> > >> >> >>
> > >> >> >> - Wes
> > >> >> >>
> > >> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <
> > garyw@apache.org>
> > >> >> wrote:
> > >> >> >> > Thank you for sharing the videos. We are very interested in
> how
> > to
> > >> >> >> support
> > >> >> >> > Arrow data format and collection very closely, could you
> please
> > >> help
> > >> >> to
> > >> >> >> > point out which interfaces to allow Mnemonic act as a memory
> > >> provider
> > >> >> for
> > >> >> >> > the user to store and access Arrow managed datasets ? Thanks!
> > >> >> >> >
> > >> >> >> > Very truly yours,
> > >> >> >> > Gary.
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
> > >> ivan.sadikov@gmail.com
> > >> >> >
> > >> >> >> > wrote:
> > >> >> >> >
> > >> >> >> >> Great presentation! Thank you for sharing.
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <
> > wesmckinn@gmail.com
> > >> >
> > >> >> >> wrote:
> > >> >> >> >>
> > >> >> >> >> > Absolutely. I will do that now
> > >> >> >> >> >
> > >> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <
> > jhyde@apache.org>
> > >> >> >> wrote:
> > >> >> >> >> > > Thanks for sharing. Can we tweet those videos as well? I
> > see
> > >> that
> > >> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
> > >> apachearrow>
> > >> >> >> only
> > >> >> >> >> > tweeted your slides.
> > >> >> >> >> > >
> > >> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
> > >> wesmckinn@gmail.com>
> > >> >> >> >> wrote:
> > >> >> >> >> > >>
> > >> >> >> >> > >> hi all,
> > >> >> >> >> > >>
> > >> >> >> >> > >> In case folks here are interested, I gave a keynote this
> > >> week at
> > >> >> >> >> > >> JupyterCon explaining my motivations for being involved
> in
> > >> >> Apache
> > >> >> >> >> > >> Arrow and how I see it fitting in with the data science
> > >> >> ecosystem
> > >> >> >> long
> > >> >> >> >> > >> term:
> > >> >> >> >> > >>
> > >> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> > >> >> >> >> > >>
> > >> >> >> >> > >> I also gave an interview going a little deeper into some
> > of
> > >> the
> > >> >> >> topics
> > >> >> >> >> > >> from the talk:
> > >> >> >> >> > >>
> > >> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> > >> >> >> >> > >>
> > >> >> >> >> > >> I believe we have an exciting journey ahead of us, but
> > it's
> > >> >> >> certainly
> > >> >> >> >> > >> going to take a lot of collaboration and community
> > >> development.
> > >> >> >> >> > >>
> > >> >> >> >> > >> - Wes
> > >> >> >> >> > >
> > >> >> >> >> >
> > >> >> >> >>
> > >> >> >>
> > >> >>
> > >>
> >
>

Fwd: Apache Arrow at JupyterCon

Posted by "Gang(Gary) Wang" <ga...@apache.org>.

I forward this discussion thread here for your information, please join if
you are also interested in this topic.


---------- Forwarded message ----------
From: Jacques Nadeau <ja...@apache.org>
Date: Wed, Sep 6, 2017 at 9:11 PM
Subject: Re: Apache Arrow at JupyterCon
To: dev@arrow.apache.org


This is a interesting problem but also pretty complex. Arrow's Java memory
management model is complex on purpose (see
https://github.com/apache/arrow/blob/master/java/memory/
src/main/java/org/apache/arrow/memory/README.md
for more info). It is designed to reserve and share memory in multiple
hierarchical domains (with reservations and limits) while providing
transfer semantics across those domains with minimal contention and
locking. An opaque (and potentially easy starting point would be to
optionally allow AllocationManager to use something other than the
PooledByteBufAllocatorL and UnsafeDirectLittleEndian for memory allocation.
This wouldn't expose movement between different memory tiers but that could
be managed underneath the Arrow system. At the end of the day, the whole
hierarchy is basically a collection of memory addresses, accounting and
reference counting.

A phase two could be a proposal which allows movement between memory
domains and could be generified across systems like Mnemonic as well
GPU/Device memory domains.


On Wed, Sep 6, 2017 at 4:45 PM, Wes McKinney <we...@gmail.com> wrote:

> Thanks Gary, that is helpful context. In light if this, it might be
> worth writing some kind of a proposal for how to enable the Java
> vector classes to be backed by some other kind of byte buffers. It
> might be that an alternative version of portions of the Arrow Java
> library (i.e. decoupled from Netty) might need to be created.
>
> If it cannot be reconciled with the Netty AbstractByteBuf class then
> this would be useful to know so that Arrow developers can plan
> accordingly for the future.
>
> On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <qi...@gmail.com> wrote:
> > The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is
> > defined in the Netty library, it does more like a memory pool not a pure
> > buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
> >
> > I have ever tried to make ArrowBuf build on top of DurableBuffer of
> > Mnemonic, but looks it is not very easy to decouple the refcount from
> other
> > parts because the lifecycle of DurableBuffer could also be managed by
> > JVM automatically instead of using refcount.
> >
> > I still want to figure out how gracefully to migrate the backend of
> > ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could bring
> > other benefits for Arrow e.g. persistent on any kind of memory service
> that
> > could make use of SSD, NVMe, Memory and NAS and more. in this way, Arrow
> is
> > able to break through the capacity limitation of system memory, avoid
the
> > SerDe for storage and link other durable objects with ease and etc.
> >
> >
> >
> >
> > On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> It should be possible to have an ArrowBuf backed by a
> >> MappedByteBuffer. Anyone reading is welcome to dig in and write a
> >> patch for this.
> >>
> >> Semantically this is what we have done in C++ -- a memory map inherits
> >> from arrow::Buffer, so we can slice and dice a memory map as we would
> >> any other Buffer object
> >>
> >> https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/io/file.cc#L501
> >>
> >> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
> >> <go...@gmail.com> wrote:
> >> > This is a very interesting feature. It's very surprising that there
> is no
> >> > ByteBuffer implementation backed on a MappedByteBuffer. As far as I
> >> > understand, it should be trivial to implement (maybe not to pool) as
> >> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer
extends
> >> > that. But I didn't find implementations when I goggled for it.
> >> >
> >> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
> >> >
> >> >> I think ideally we would have a Java interface that would support
all
> >> of:
> >> >>
> >> >> - Memory mapped files
> >> >> - Anonymous shared memory segments (e.g. POSIX shm)
> >> >> - NVM / Mnemonic
> >> >>
> >> >> We already have the ability to do zero-copy reads from buffer-like
> >> >> objects in C++ and IO interfaces that support zero copy (like memory
> >> >> mapped files). We can do zero-copy reads from ArrowBuf in Java but
we
> >> >> are missing the interfaces to shared memory sources
> >> >>
> >> >> - Wes
> >> >>
> >> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org>
> >> wrote:
> >> >> > Hi Wes,
> >> >> >
> >> >> > Thank you for the explanation. the usage of
> >> >> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
> >> >> supported
> >> >> > by Mnemonic through DurableBuffer and DurableChunk, the
> DurableChunk
> >> >> makes
> >> >> > use of unsafe to expose a plain memory space for Arrow to use
> without
> >> >> > performance penalties. that's why most of the big data frameworks
> take
> >> >> the
> >> >> > advantage of unsafe, please refer to
> >> >> > https://mnemonic.apache.org/docs/domusecases.html for the use
> cases.
> >> we
> >> >> > could work on this ticket if you think that's exactly what you
> want.
> >> >> >
> >> >> > Regarding the NVM tech., that is what Mnemonic created for. it
> could
> >> be
> >> >> > used to directly persist Java generic objects and collection on
NVM
> >> with
> >> >> no
> >> >> > SerDe. so what kind of basic tools you mentioned? probably,  we
can
> >> help
> >> >> > also identify the gaps for Mnemonic as well. Thanks!
> >> >> >
> >> >> > Very truly yours,
> >> >> > Gary
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <
> wesmckinn@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> hi Gary,
> >> >> >>
> >> >> >> The Java libraries are not yet capable of writing or zero-copy
> reads
> >> >> >> of Arrow datasets to/from shared memory or memory-mapped files:
> >> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed
> >> quite
> >> >> >> a bit of technology on the C++ side for dealing with shared
memory
> >> IPC
> >> >> >> but we need someone to help with that on the Java side.
> >> >> >>
> >> >> >> In the context of NVM technologies, it would be nice to be able
to
> >> >> >> persist a dataset to NVM and continue to do analytics on it,
while
> >> >> >> retaining a "handle" so that the dataset can be easily recovered
> in
> >> >> >> the event of process failure. We may arrive at new use cases once
> >> some
> >> >> >> of the basic tools exist.
> >> >> >>
> >> >> >> - Wes
> >> >> >>
> >> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <
> garyw@apache.org>
> >> >> wrote:
> >> >> >> > Thank you for sharing the videos. We are very interested in how
> to
> >> >> >> support
> >> >> >> > Arrow data format and collection very closely, could you please
> >> help
> >> >> to
> >> >> >> > point out which interfaces to allow Mnemonic act as a memory
> >> provider
> >> >> for
> >> >> >> > the user to store and access Arrow managed datasets ? Thanks!
> >> >> >> >
> >> >> >> > Very truly yours,
> >> >> >> > Gary.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
> >> ivan.sadikov@gmail.com
> >> >> >
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> >> Great presentation! Thank you for sharing.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <
> wesmckinn@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Absolutely. I will do that now
> >> >> >> >> >
> >> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <
> jhyde@apache.org>
> >> >> >> wrote:
> >> >> >> >> > > Thanks for sharing. Can we tweet those videos as well? I
> see
> >> that
> >> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
> >> apachearrow>
> >> >> >> only
> >> >> >> >> > tweeted your slides.
> >> >> >> >> > >
> >> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
> >> wesmckinn@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > >>
> >> >> >> >> > >> hi all,
> >> >> >> >> > >>
> >> >> >> >> > >> In case folks here are interested, I gave a keynote this
> >> week at
> >> >> >> >> > >> JupyterCon explaining my motivations for being involved
in
> >> >> Apache
> >> >> >> >> > >> Arrow and how I see it fitting in with the data science
> >> >> ecosystem
> >> >> >> long
> >> >> >> >> > >> term:
> >> >> >> >> > >>
> >> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> >> >> >> >> > >>
> >> >> >> >> > >> I also gave an interview going a little deeper into some
> of
> >> the
> >> >> >> topics
> >> >> >> >> > >> from the talk:
> >> >> >> >> > >>
> >> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> >> >> >> >> > >>
> >> >> >> >> > >> I believe we have an exciting journey ahead of us, but
> it's
> >> >> >> certainly
> >> >> >> >> > >> going to take a lot of collaboration and community
> >> development.
> >> >> >> >> > >>
> >> >> >> >> > >> - Wes
> >> >> >> >> > >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: Apache Arrow at JupyterCon

Posted by Jacques Nadeau <ja...@apache.org>.

This is a interesting problem but also pretty complex. Arrow's Java memory
management model is complex on purpose (see
https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/README.md
for more info). It is designed to reserve and share memory in multiple
hierarchical domains (with reservations and limits) while providing
transfer semantics across those domains with minimal contention and
locking. An opaque (and potentially easy starting point would be to
optionally allow AllocationManager to use something other than the
PooledByteBufAllocatorL and UnsafeDirectLittleEndian for memory allocation.
This wouldn't expose movement between different memory tiers but that could
be managed underneath the Arrow system. At the end of the day, the whole
hierarchy is basically a collection of memory addresses, accounting and
reference counting.

A phase two could be a proposal which allows movement between memory
domains and could be generified across systems like Mnemonic as well
GPU/Device memory domains.


On Wed, Sep 6, 2017 at 4:45 PM, Wes McKinney <we...@gmail.com> wrote:

> Thanks Gary, that is helpful context. In light if this, it might be
> worth writing some kind of a proposal for how to enable the Java
> vector classes to be backed by some other kind of byte buffers. It
> might be that an alternative version of portions of the Arrow Java
> library (i.e. decoupled from Netty) might need to be created.
>
> If it cannot be reconciled with the Netty AbstractByteBuf class then
> this would be useful to know so that Arrow developers can plan
> accordingly for the future.
>
> On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <qi...@gmail.com> wrote:
> > The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is
> > defined in the Netty library, it does more like a memory pool not a pure
> > buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
> >
> > I have ever tried to make ArrowBuf build on top of DurableBuffer of
> > Mnemonic, but looks it is not very easy to decouple the refcount from
> other
> > parts because the lifecycle of DurableBuffer could also be managed by
> > JVM automatically instead of using refcount.
> >
> > I still want to figure out how gracefully to migrate the backend of
> > ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could bring
> > other benefits for Arrow e.g. persistent on any kind of memory service
> that
> > could make use of SSD, NVMe, Memory and NAS and more. in this way, Arrow
> is
> > able to break through the capacity limitation of system memory, avoid the
> > SerDe for storage and link other durable objects with ease and etc.
> >
> >
> >
> >
> > On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> It should be possible to have an ArrowBuf backed by a
> >> MappedByteBuffer. Anyone reading is welcome to dig in and write a
> >> patch for this.
> >>
> >> Semantically this is what we have done in C++ -- a memory map inherits
> >> from arrow::Buffer, so we can slice and dice a memory map as we would
> >> any other Buffer object
> >>
> >> https://github.com/apache/arrow/blob/master/cpp/src/
> arrow/io/file.cc#L501
> >>
> >> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
> >> <go...@gmail.com> wrote:
> >> > This is a very interesting feature. It's very surprising that there
> is no
> >> > ByteBuffer implementation backed on a MappedByteBuffer. As far as I
> >> > understand, it should be trivial to implement (maybe not to pool) as
> >> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
> >> > that. But I didn't find implementations when I goggled for it.
> >> >
> >> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
> >> >
> >> >> I think ideally we would have a Java interface that would support all
> >> of:
> >> >>
> >> >> - Memory mapped files
> >> >> - Anonymous shared memory segments (e.g. POSIX shm)
> >> >> - NVM / Mnemonic
> >> >>
> >> >> We already have the ability to do zero-copy reads from buffer-like
> >> >> objects in C++ and IO interfaces that support zero copy (like memory
> >> >> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
> >> >> are missing the interfaces to shared memory sources
> >> >>
> >> >> - Wes
> >> >>
> >> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org>
> >> wrote:
> >> >> > Hi Wes,
> >> >> >
> >> >> > Thank you for the explanation. the usage of
> >> >> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
> >> >> supported
> >> >> > by Mnemonic through DurableBuffer and DurableChunk, the
> DurableChunk
> >> >> makes
> >> >> > use of unsafe to expose a plain memory space for Arrow to use
> without
> >> >> > performance penalties. that's why most of the big data frameworks
> take
> >> >> the
> >> >> > advantage of unsafe, please refer to
> >> >> > https://mnemonic.apache.org/docs/domusecases.html for the use
> cases.
> >> we
> >> >> > could work on this ticket if you think that's exactly what you
> want.
> >> >> >
> >> >> > Regarding the NVM tech., that is what Mnemonic created for. it
> could
> >> be
> >> >> > used to directly persist Java generic objects and collection on NVM
> >> with
> >> >> no
> >> >> > SerDe. so what kind of basic tools you mentioned? probably,  we can
> >> help
> >> >> > also identify the gaps for Mnemonic as well. Thanks!
> >> >> >
> >> >> > Very truly yours,
> >> >> > Gary
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <
> wesmckinn@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> hi Gary,
> >> >> >>
> >> >> >> The Java libraries are not yet capable of writing or zero-copy
> reads
> >> >> >> of Arrow datasets to/from shared memory or memory-mapped files:
> >> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed
> >> quite
> >> >> >> a bit of technology on the C++ side for dealing with shared memory
> >> IPC
> >> >> >> but we need someone to help with that on the Java side.
> >> >> >>
> >> >> >> In the context of NVM technologies, it would be nice to be able to
> >> >> >> persist a dataset to NVM and continue to do analytics on it, while
> >> >> >> retaining a "handle" so that the dataset can be easily recovered
> in
> >> >> >> the event of process failure. We may arrive at new use cases once
> >> some
> >> >> >> of the basic tools exist.
> >> >> >>
> >> >> >> - Wes
> >> >> >>
> >> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <
> garyw@apache.org>
> >> >> wrote:
> >> >> >> > Thank you for sharing the videos. We are very interested in how
> to
> >> >> >> support
> >> >> >> > Arrow data format and collection very closely, could you please
> >> help
> >> >> to
> >> >> >> > point out which interfaces to allow Mnemonic act as a memory
> >> provider
> >> >> for
> >> >> >> > the user to store and access Arrow managed datasets ? Thanks!
> >> >> >> >
> >> >> >> > Very truly yours,
> >> >> >> > Gary.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
> >> ivan.sadikov@gmail.com
> >> >> >
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> >> Great presentation! Thank you for sharing.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <
> wesmckinn@gmail.com
> >> >
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Absolutely. I will do that now
> >> >> >> >> >
> >> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <
> jhyde@apache.org>
> >> >> >> wrote:
> >> >> >> >> > > Thanks for sharing. Can we tweet those videos as well? I
> see
> >> that
> >> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
> >> apachearrow>
> >> >> >> only
> >> >> >> >> > tweeted your slides.
> >> >> >> >> > >
> >> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
> >> wesmckinn@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >> > >>
> >> >> >> >> > >> hi all,
> >> >> >> >> > >>
> >> >> >> >> > >> In case folks here are interested, I gave a keynote this
> >> week at
> >> >> >> >> > >> JupyterCon explaining my motivations for being involved in
> >> >> Apache
> >> >> >> >> > >> Arrow and how I see it fitting in with the data science
> >> >> ecosystem
> >> >> >> long
> >> >> >> >> > >> term:
> >> >> >> >> > >>
> >> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> >> >> >> >> > >>
> >> >> >> >> > >> I also gave an interview going a little deeper into some
> of
> >> the
> >> >> >> topics
> >> >> >> >> > >> from the talk:
> >> >> >> >> > >>
> >> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> >> >> >> >> > >>
> >> >> >> >> > >> I believe we have an exciting journey ahead of us, but
> it's
> >> >> >> certainly
> >> >> >> >> > >> going to take a lot of collaboration and community
> >> development.
> >> >> >> >> > >>
> >> >> >> >> > >> - Wes
> >> >> >> >> > >
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: Apache Arrow at JupyterCon

Posted by Wes McKinney <we...@gmail.com>.

Thanks Gary, that is helpful context. In light if this, it might be
worth writing some kind of a proposal for how to enable the Java
vector classes to be backed by some other kind of byte buffers. It
might be that an alternative version of portions of the Arrow Java
library (i.e. decoupled from Netty) might need to be created.

If it cannot be reconciled with the Netty AbstractByteBuf class then
this would be useful to know so that Arrow developers can plan
accordingly for the future.

On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <qi...@gmail.com> wrote:
> The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is
> defined in the Netty library, it does more like a memory pool not a pure
> buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
>
> I have ever tried to make ArrowBuf build on top of DurableBuffer of
> Mnemonic, but looks it is not very easy to decouple the refcount from other
> parts because the lifecycle of DurableBuffer could also be managed by
> JVM automatically instead of using refcount.
>
> I still want to figure out how gracefully to migrate the backend of
> ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could bring
> other benefits for Arrow e.g. persistent on any kind of memory service that
> could make use of SSD, NVMe, Memory and NAS and more. in this way, Arrow is
> able to break through the capacity limitation of system memory, avoid the
> SerDe for storage and link other durable objects with ease and etc.
>
>
>
>
> On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <we...@gmail.com> wrote:
>
>> It should be possible to have an ArrowBuf backed by a
>> MappedByteBuffer. Anyone reading is welcome to dig in and write a
>> patch for this.
>>
>> Semantically this is what we have done in C++ -- a memory map inherits
>> from arrow::Buffer, so we can slice and dice a memory map as we would
>> any other Buffer object
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L501
>>
>> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
>> <go...@gmail.com> wrote:
>> > This is a very interesting feature. It's very surprising that there is no
>> > ByteBuffer implementation backed on a MappedByteBuffer. As far as I
>> > understand, it should be trivial to implement (maybe not to pool) as
>> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
>> > that. But I didn't find implementations when I goggled for it.
>> >
>> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
>> >
>> >> I think ideally we would have a Java interface that would support all
>> of:
>> >>
>> >> - Memory mapped files
>> >> - Anonymous shared memory segments (e.g. POSIX shm)
>> >> - NVM / Mnemonic
>> >>
>> >> We already have the ability to do zero-copy reads from buffer-like
>> >> objects in C++ and IO interfaces that support zero copy (like memory
>> >> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
>> >> are missing the interfaces to shared memory sources
>> >>
>> >> - Wes
>> >>
>> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org>
>> wrote:
>> >> > Hi Wes,
>> >> >
>> >> > Thank you for the explanation. the usage of
>> >> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
>> >> supported
>> >> > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk
>> >> makes
>> >> > use of unsafe to expose a plain memory space for Arrow to use without
>> >> > performance penalties. that's why most of the big data frameworks take
>> >> the
>> >> > advantage of unsafe, please refer to
>> >> > https://mnemonic.apache.org/docs/domusecases.html for the use cases.
>> we
>> >> > could work on this ticket if you think that's exactly what you want.
>> >> >
>> >> > Regarding the NVM tech., that is what Mnemonic created for. it could
>> be
>> >> > used to directly persist Java generic objects and collection on NVM
>> with
>> >> no
>> >> > SerDe. so what kind of basic tools you mentioned? probably,  we can
>> help
>> >> > also identify the gaps for Mnemonic as well. Thanks!
>> >> >
>> >> > Very truly yours,
>> >> > Gary
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <we...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> hi Gary,
>> >> >>
>> >> >> The Java libraries are not yet capable of writing or zero-copy reads
>> >> >> of Arrow datasets to/from shared memory or memory-mapped files:
>> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed
>> quite
>> >> >> a bit of technology on the C++ side for dealing with shared memory
>> IPC
>> >> >> but we need someone to help with that on the Java side.
>> >> >>
>> >> >> In the context of NVM technologies, it would be nice to be able to
>> >> >> persist a dataset to NVM and continue to do analytics on it, while
>> >> >> retaining a "handle" so that the dataset can be easily recovered in
>> >> >> the event of process failure. We may arrive at new use cases once
>> some
>> >> >> of the basic tools exist.
>> >> >>
>> >> >> - Wes
>> >> >>
>> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org>
>> >> wrote:
>> >> >> > Thank you for sharing the videos. We are very interested in how to
>> >> >> support
>> >> >> > Arrow data format and collection very closely, could you please
>> help
>> >> to
>> >> >> > point out which interfaces to allow Mnemonic act as a memory
>> provider
>> >> for
>> >> >> > the user to store and access Arrow managed datasets ? Thanks!
>> >> >> >
>> >> >> > Very truly yours,
>> >> >> > Gary.
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
>> ivan.sadikov@gmail.com
>> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> >> Great presentation! Thank you for sharing.
>> >> >> >>
>> >> >> >>
>> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <wesmckinn@gmail.com
>> >
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Absolutely. I will do that now
>> >> >> >> >
>> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
>> >> >> wrote:
>> >> >> >> > > Thanks for sharing. Can we tweet those videos as well? I see
>> that
>> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
>> apachearrow>
>> >> >> only
>> >> >> >> > tweeted your slides.
>> >> >> >> > >
>> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
>> wesmckinn@gmail.com>
>> >> >> >> wrote:
>> >> >> >> > >>
>> >> >> >> > >> hi all,
>> >> >> >> > >>
>> >> >> >> > >> In case folks here are interested, I gave a keynote this
>> week at
>> >> >> >> > >> JupyterCon explaining my motivations for being involved in
>> >> Apache
>> >> >> >> > >> Arrow and how I see it fitting in with the data science
>> >> ecosystem
>> >> >> long
>> >> >> >> > >> term:
>> >> >> >> > >>
>> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
>> >> >> >> > >>
>> >> >> >> > >> I also gave an interview going a little deeper into some of
>> the
>> >> >> topics
>> >> >> >> > >> from the talk:
>> >> >> >> > >>
>> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>> >> >> >> > >>
>> >> >> >> > >> I believe we have an exciting journey ahead of us, but it's
>> >> >> certainly
>> >> >> >> > >> going to take a lot of collaboration and community
>> development.
>> >> >> >> > >>
>> >> >> >> > >> - Wes
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>>

Re: Apache Arrow at JupyterCon

Posted by Gary Wong <qi...@gmail.com>.

The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is
defined in the Netty library, it does more like a memory pool not a pure
buffer so that's why ArrowBuf is not backed by ByteBuffer as now.

I have ever tried to make ArrowBuf build on top of DurableBuffer of
Mnemonic, but looks it is not very easy to decouple the refcount from other
parts because the lifecycle of DurableBuffer could also be managed by
JVM automatically instead of using refcount.

I still want to figure out how gracefully to migrate the backend of
ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could bring
other benefits for Arrow e.g. persistent on any kind of memory service that
could make use of SSD, NVMe, Memory and NAS and more. in this way, Arrow is
able to break through the capacity limitation of system memory, avoid the
SerDe for storage and link other durable objects with ease and etc.




On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <we...@gmail.com> wrote:

> It should be possible to have an ArrowBuf backed by a
> MappedByteBuffer. Anyone reading is welcome to dig in and write a
> patch for this.
>
> Semantically this is what we have done in C++ -- a memory map inherits
> from arrow::Buffer, so we can slice and dice a memory map as we would
> any other Buffer object
>
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L501
>
> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
> <go...@gmail.com> wrote:
> > This is a very interesting feature. It's very surprising that there is no
> > ByteBuffer implementation backed on a MappedByteBuffer. As far as I
> > understand, it should be trivial to implement (maybe not to pool) as
> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
> > that. But I didn't find implementations when I goggled for it.
> >
> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
> >
> >> I think ideally we would have a Java interface that would support all
> of:
> >>
> >> - Memory mapped files
> >> - Anonymous shared memory segments (e.g. POSIX shm)
> >> - NVM / Mnemonic
> >>
> >> We already have the ability to do zero-copy reads from buffer-like
> >> objects in C++ and IO interfaces that support zero copy (like memory
> >> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
> >> are missing the interfaces to shared memory sources
> >>
> >> - Wes
> >>
> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org>
> wrote:
> >> > Hi Wes,
> >> >
> >> > Thank you for the explanation. the usage of
> >> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
> >> supported
> >> > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk
> >> makes
> >> > use of unsafe to expose a plain memory space for Arrow to use without
> >> > performance penalties. that's why most of the big data frameworks take
> >> the
> >> > advantage of unsafe, please refer to
> >> > https://mnemonic.apache.org/docs/domusecases.html for the use cases.
> we
> >> > could work on this ticket if you think that's exactly what you want.
> >> >
> >> > Regarding the NVM tech., that is what Mnemonic created for. it could
> be
> >> > used to directly persist Java generic objects and collection on NVM
> with
> >> no
> >> > SerDe. so what kind of basic tools you mentioned? probably,  we can
> help
> >> > also identify the gaps for Mnemonic as well. Thanks!
> >> >
> >> > Very truly yours,
> >> > Gary
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <we...@gmail.com>
> >> wrote:
> >> >
> >> >> hi Gary,
> >> >>
> >> >> The Java libraries are not yet capable of writing or zero-copy reads
> >> >> of Arrow datasets to/from shared memory or memory-mapped files:
> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed
> quite
> >> >> a bit of technology on the C++ side for dealing with shared memory
> IPC
> >> >> but we need someone to help with that on the Java side.
> >> >>
> >> >> In the context of NVM technologies, it would be nice to be able to
> >> >> persist a dataset to NVM and continue to do analytics on it, while
> >> >> retaining a "handle" so that the dataset can be easily recovered in
> >> >> the event of process failure. We may arrive at new use cases once
> some
> >> >> of the basic tools exist.
> >> >>
> >> >> - Wes
> >> >>
> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org>
> >> wrote:
> >> >> > Thank you for sharing the videos. We are very interested in how to
> >> >> support
> >> >> > Arrow data format and collection very closely, could you please
> help
> >> to
> >> >> > point out which interfaces to allow Mnemonic act as a memory
> provider
> >> for
> >> >> > the user to store and access Arrow managed datasets ? Thanks!
> >> >> >
> >> >> > Very truly yours,
> >> >> > Gary.
> >> >> >
> >> >> >
> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
> ivan.sadikov@gmail.com
> >> >
> >> >> > wrote:
> >> >> >
> >> >> >> Great presentation! Thank you for sharing.
> >> >> >>
> >> >> >>
> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <wesmckinn@gmail.com
> >
> >> >> wrote:
> >> >> >>
> >> >> >> > Absolutely. I will do that now
> >> >> >> >
> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
> >> >> wrote:
> >> >> >> > > Thanks for sharing. Can we tweet those videos as well? I see
> that
> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
> apachearrow>
> >> >> only
> >> >> >> > tweeted your slides.
> >> >> >> > >
> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
> wesmckinn@gmail.com>
> >> >> >> wrote:
> >> >> >> > >>
> >> >> >> > >> hi all,
> >> >> >> > >>
> >> >> >> > >> In case folks here are interested, I gave a keynote this
> week at
> >> >> >> > >> JupyterCon explaining my motivations for being involved in
> >> Apache
> >> >> >> > >> Arrow and how I see it fitting in with the data science
> >> ecosystem
> >> >> long
> >> >> >> > >> term:
> >> >> >> > >>
> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> >> >> >> > >>
> >> >> >> > >> I also gave an interview going a little deeper into some of
> the
> >> >> topics
> >> >> >> > >> from the talk:
> >> >> >> > >>
> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> >> >> >> > >>
> >> >> >> > >> I believe we have an exciting journey ahead of us, but it's
> >> >> certainly
> >> >> >> > >> going to take a lot of collaboration and community
> development.
> >> >> >> > >>
> >> >> >> > >> - Wes
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
>

Re: Apache Arrow at JupyterCon

Posted by Wes McKinney <we...@gmail.com>.

It should be possible to have an ArrowBuf backed by a
MappedByteBuffer. Anyone reading is welcome to dig in and write a
patch for this.

Semantically this is what we have done in C++ -- a memory map inherits
from arrow::Buffer, so we can slice and dice a memory map as we would
any other Buffer object

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L501

On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
<go...@gmail.com> wrote:
> This is a very interesting feature. It's very surprising that there is no
> ByteBuffer implementation backed on a MappedByteBuffer. As far as I
> understand, it should be trivial to implement (maybe not to pool) as
> usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
> that. But I didn't find implementations when I goggled for it.
>
> 2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:
>
>> I think ideally we would have a Java interface that would support all of:
>>
>> - Memory mapped files
>> - Anonymous shared memory segments (e.g. POSIX shm)
>> - NVM / Mnemonic
>>
>> We already have the ability to do zero-copy reads from buffer-like
>> objects in C++ and IO interfaces that support zero copy (like memory
>> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
>> are missing the interfaces to shared memory sources
>>
>> - Wes
>>
>> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
>> > Hi Wes,
>> >
>> > Thank you for the explanation. the usage of
>> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
>> supported
>> > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk
>> makes
>> > use of unsafe to expose a plain memory space for Arrow to use without
>> > performance penalties. that's why most of the big data frameworks take
>> the
>> > advantage of unsafe, please refer to
>> > https://mnemonic.apache.org/docs/domusecases.html for the use cases. we
>> > could work on this ticket if you think that's exactly what you want.
>> >
>> > Regarding the NVM tech., that is what Mnemonic created for. it could be
>> > used to directly persist Java generic objects and collection on NVM with
>> no
>> > SerDe. so what kind of basic tools you mentioned? probably,  we can help
>> > also identify the gaps for Mnemonic as well. Thanks!
>> >
>> > Very truly yours,
>> > Gary
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> >> hi Gary,
>> >>
>> >> The Java libraries are not yet capable of writing or zero-copy reads
>> >> of Arrow datasets to/from shared memory or memory-mapped files:
>> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed quite
>> >> a bit of technology on the C++ side for dealing with shared memory IPC
>> >> but we need someone to help with that on the Java side.
>> >>
>> >> In the context of NVM technologies, it would be nice to be able to
>> >> persist a dataset to NVM and continue to do analytics on it, while
>> >> retaining a "handle" so that the dataset can be easily recovered in
>> >> the event of process failure. We may arrive at new use cases once some
>> >> of the basic tools exist.
>> >>
>> >> - Wes
>> >>
>> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org>
>> wrote:
>> >> > Thank you for sharing the videos. We are very interested in how to
>> >> support
>> >> > Arrow data format and collection very closely, could you please help
>> to
>> >> > point out which interfaces to allow Mnemonic act as a memory provider
>> for
>> >> > the user to store and access Arrow managed datasets ? Thanks!
>> >> >
>> >> > Very truly yours,
>> >> > Gary.
>> >> >
>> >> >
>> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <ivan.sadikov@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> >> Great presentation! Thank you for sharing.
>> >> >>
>> >> >>
>> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com>
>> >> wrote:
>> >> >>
>> >> >> > Absolutely. I will do that now
>> >> >> >
>> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
>> >> wrote:
>> >> >> > > Thanks for sharing. Can we tweet those videos as well? I see that
>> >> >> > https://twitter.com/apachearrow <https://twitter.com/apachearrow>
>> >> only
>> >> >> > tweeted your slides.
>> >> >> > >
>> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com>
>> >> >> wrote:
>> >> >> > >>
>> >> >> > >> hi all,
>> >> >> > >>
>> >> >> > >> In case folks here are interested, I gave a keynote this week at
>> >> >> > >> JupyterCon explaining my motivations for being involved in
>> Apache
>> >> >> > >> Arrow and how I see it fitting in with the data science
>> ecosystem
>> >> long
>> >> >> > >> term:
>> >> >> > >>
>> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
>> >> >> > >>
>> >> >> > >> I also gave an interview going a little deeper into some of the
>> >> topics
>> >> >> > >> from the talk:
>> >> >> > >>
>> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>> >> >> > >>
>> >> >> > >> I believe we have an exciting journey ahead of us, but it's
>> >> certainly
>> >> >> > >> going to take a lot of collaboration and community development.
>> >> >> > >>
>> >> >> > >> - Wes
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Re: Apache Arrow at JupyterCon

Posted by Gonzalo Ortiz Jaureguizar <go...@gmail.com>.

This is a very interesting feature. It's very surprising that there is no
ByteBuffer implementation backed on a MappedByteBuffer. As far as I
understand, it should be trivial to implement (maybe not to pool) as
usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
that. But I didn't find implementations when I goggled for it.

2017-09-03 16:12 GMT+02:00 Wes McKinney <we...@gmail.com>:

> I think ideally we would have a Java interface that would support all of:
>
> - Memory mapped files
> - Anonymous shared memory segments (e.g. POSIX shm)
> - NVM / Mnemonic
>
> We already have the ability to do zero-copy reads from buffer-like
> objects in C++ and IO interfaces that support zero copy (like memory
> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
> are missing the interfaces to shared memory sources
>
> - Wes
>
> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
> > Hi Wes,
> >
> > Thank you for the explanation. the usage of
> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
> supported
> > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk
> makes
> > use of unsafe to expose a plain memory space for Arrow to use without
> > performance penalties. that's why most of the big data frameworks take
> the
> > advantage of unsafe, please refer to
> > https://mnemonic.apache.org/docs/domusecases.html for the use cases. we
> > could work on this ticket if you think that's exactly what you want.
> >
> > Regarding the NVM tech., that is what Mnemonic created for. it could be
> > used to directly persist Java generic objects and collection on NVM with
> no
> > SerDe. so what kind of basic tools you mentioned? probably,  we can help
> > also identify the gaps for Mnemonic as well. Thanks!
> >
> > Very truly yours,
> > Gary
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <we...@gmail.com>
> wrote:
> >
> >> hi Gary,
> >>
> >> The Java libraries are not yet capable of writing or zero-copy reads
> >> of Arrow datasets to/from shared memory or memory-mapped files:
> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed quite
> >> a bit of technology on the C++ side for dealing with shared memory IPC
> >> but we need someone to help with that on the Java side.
> >>
> >> In the context of NVM technologies, it would be nice to be able to
> >> persist a dataset to NVM and continue to do analytics on it, while
> >> retaining a "handle" so that the dataset can be easily recovered in
> >> the event of process failure. We may arrive at new use cases once some
> >> of the basic tools exist.
> >>
> >> - Wes
> >>
> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org>
> wrote:
> >> > Thank you for sharing the videos. We are very interested in how to
> >> support
> >> > Arrow data format and collection very closely, could you please help
> to
> >> > point out which interfaces to allow Mnemonic act as a memory provider
> for
> >> > the user to store and access Arrow managed datasets ? Thanks!
> >> >
> >> > Very truly yours,
> >> > Gary.
> >> >
> >> >
> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <ivan.sadikov@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Great presentation! Thank you for sharing.
> >> >>
> >> >>
> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com>
> >> wrote:
> >> >>
> >> >> > Absolutely. I will do that now
> >> >> >
> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
> >> wrote:
> >> >> > > Thanks for sharing. Can we tweet those videos as well? I see that
> >> >> > https://twitter.com/apachearrow <https://twitter.com/apachearrow>
> >> only
> >> >> > tweeted your slides.
> >> >> > >
> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com>
> >> >> wrote:
> >> >> > >>
> >> >> > >> hi all,
> >> >> > >>
> >> >> > >> In case folks here are interested, I gave a keynote this week at
> >> >> > >> JupyterCon explaining my motivations for being involved in
> Apache
> >> >> > >> Arrow and how I see it fitting in with the data science
> ecosystem
> >> long
> >> >> > >> term:
> >> >> > >>
> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> >> >> > >>
> >> >> > >> I also gave an interview going a little deeper into some of the
> >> topics
> >> >> > >> from the talk:
> >> >> > >>
> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> >> >> > >>
> >> >> > >> I believe we have an exciting journey ahead of us, but it's
> >> certainly
> >> >> > >> going to take a lot of collaboration and community development.
> >> >> > >>
> >> >> > >> - Wes
> >> >> > >
> >> >> >
> >> >>
> >>
>

Re: Apache Arrow at JupyterCon

Posted by Wes McKinney <we...@gmail.com>.

I think ideally we would have a Java interface that would support all of:

- Memory mapped files
- Anonymous shared memory segments (e.g. POSIX shm)
- NVM / Mnemonic

We already have the ability to do zero-copy reads from buffer-like
objects in C++ and IO interfaces that support zero copy (like memory
mapped files). We can do zero-copy reads from ArrowBuf in Java but we
are missing the interfaces to shared memory sources

- Wes

On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
> Hi Wes,
>
> Thank you for the explanation. the usage of
> https://issues.apache.org/jira/browse/ARROW-721 could be directly supported
> by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk makes
> use of unsafe to expose a plain memory space for Arrow to use without
> performance penalties. that's why most of the big data frameworks take the
> advantage of unsafe, please refer to
> https://mnemonic.apache.org/docs/domusecases.html for the use cases. we
> could work on this ticket if you think that's exactly what you want.
>
> Regarding the NVM tech., that is what Mnemonic created for. it could be
> used to directly persist Java generic objects and collection on NVM with no
> SerDe. so what kind of basic tools you mentioned? probably,  we can help
> also identify the gaps for Mnemonic as well. Thanks!
>
> Very truly yours,
> Gary
>
>
>
>
>
>
>
>
>
>
> On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <we...@gmail.com> wrote:
>
>> hi Gary,
>>
>> The Java libraries are not yet capable of writing or zero-copy reads
>> of Arrow datasets to/from shared memory or memory-mapped files:
>> https://issues.apache.org/jira/browse/ARROW-721. We've developed quite
>> a bit of technology on the C++ side for dealing with shared memory IPC
>> but we need someone to help with that on the Java side.
>>
>> In the context of NVM technologies, it would be nice to be able to
>> persist a dataset to NVM and continue to do analytics on it, while
>> retaining a "handle" so that the dataset can be easily recovered in
>> the event of process failure. We may arrive at new use cases once some
>> of the basic tools exist.
>>
>> - Wes
>>
>> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
>> > Thank you for sharing the videos. We are very interested in how to
>> support
>> > Arrow data format and collection very closely, could you please help to
>> > point out which interfaces to allow Mnemonic act as a memory provider for
>> > the user to store and access Arrow managed datasets ? Thanks!
>> >
>> > Very truly yours,
>> > Gary.
>> >
>> >
>> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <iv...@gmail.com>
>> > wrote:
>> >
>> >> Great presentation! Thank you for sharing.
>> >>
>> >>
>> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com>
>> wrote:
>> >>
>> >> > Absolutely. I will do that now
>> >> >
>> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
>> wrote:
>> >> > > Thanks for sharing. Can we tweet those videos as well? I see that
>> >> > https://twitter.com/apachearrow <https://twitter.com/apachearrow>
>> only
>> >> > tweeted your slides.
>> >> > >
>> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com>
>> >> wrote:
>> >> > >>
>> >> > >> hi all,
>> >> > >>
>> >> > >> In case folks here are interested, I gave a keynote this week at
>> >> > >> JupyterCon explaining my motivations for being involved in Apache
>> >> > >> Arrow and how I see it fitting in with the data science ecosystem
>> long
>> >> > >> term:
>> >> > >>
>> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
>> >> > >>
>> >> > >> I also gave an interview going a little deeper into some of the
>> topics
>> >> > >> from the talk:
>> >> > >>
>> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>> >> > >>
>> >> > >> I believe we have an exciting journey ahead of us, but it's
>> certainly
>> >> > >> going to take a lot of collaboration and community development.
>> >> > >>
>> >> > >> - Wes
>> >> > >
>> >> >
>> >>
>>

Re: Apache Arrow at JupyterCon

Posted by "Gang(Gary) Wang" <ga...@apache.org>.

Hi Wes,

Thank you for the explanation. the usage of
https://issues.apache.org/jira/browse/ARROW-721 could be directly supported
by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk makes
use of unsafe to expose a plain memory space for Arrow to use without
performance penalties. that's why most of the big data frameworks take the
advantage of unsafe, please refer to
https://mnemonic.apache.org/docs/domusecases.html for the use cases. we
could work on this ticket if you think that's exactly what you want.

Regarding the NVM tech., that is what Mnemonic created for. it could be
used to directly persist Java generic objects and collection on NVM with no
SerDe. so what kind of basic tools you mentioned? probably,  we can help
also identify the gaps for Mnemonic as well. Thanks!

Very truly yours,
Gary










On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <we...@gmail.com> wrote:

> hi Gary,
>
> The Java libraries are not yet capable of writing or zero-copy reads
> of Arrow datasets to/from shared memory or memory-mapped files:
> https://issues.apache.org/jira/browse/ARROW-721. We've developed quite
> a bit of technology on the C++ side for dealing with shared memory IPC
> but we need someone to help with that on the Java side.
>
> In the context of NVM technologies, it would be nice to be able to
> persist a dataset to NVM and continue to do analytics on it, while
> retaining a "handle" so that the dataset can be easily recovered in
> the event of process failure. We may arrive at new use cases once some
> of the basic tools exist.
>
> - Wes
>
> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
> > Thank you for sharing the videos. We are very interested in how to
> support
> > Arrow data format and collection very closely, could you please help to
> > point out which interfaces to allow Mnemonic act as a memory provider for
> > the user to store and access Arrow managed datasets ? Thanks!
> >
> > Very truly yours,
> > Gary.
> >
> >
> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <iv...@gmail.com>
> > wrote:
> >
> >> Great presentation! Thank you for sharing.
> >>
> >>
> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> > Absolutely. I will do that now
> >> >
> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
> wrote:
> >> > > Thanks for sharing. Can we tweet those videos as well? I see that
> >> > https://twitter.com/apachearrow <https://twitter.com/apachearrow>
> only
> >> > tweeted your slides.
> >> > >
> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com>
> >> wrote:
> >> > >>
> >> > >> hi all,
> >> > >>
> >> > >> In case folks here are interested, I gave a keynote this week at
> >> > >> JupyterCon explaining my motivations for being involved in Apache
> >> > >> Arrow and how I see it fitting in with the data science ecosystem
> long
> >> > >> term:
> >> > >>
> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> >> > >>
> >> > >> I also gave an interview going a little deeper into some of the
> topics
> >> > >> from the talk:
> >> > >>
> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> >> > >>
> >> > >> I believe we have an exciting journey ahead of us, but it's
> certainly
> >> > >> going to take a lot of collaboration and community development.
> >> > >>
> >> > >> - Wes
> >> > >
> >> >
> >>
>

Re: Apache Arrow at JupyterCon

Posted by Wes McKinney <we...@gmail.com>.

hi Gary,

The Java libraries are not yet capable of writing or zero-copy reads
of Arrow datasets to/from shared memory or memory-mapped files:
https://issues.apache.org/jira/browse/ARROW-721. We've developed quite
a bit of technology on the C++ side for dealing with shared memory IPC
but we need someone to help with that on the Java side.

In the context of NVM technologies, it would be nice to be able to
persist a dataset to NVM and continue to do analytics on it, while
retaining a "handle" so that the dataset can be easily recovered in
the event of process failure. We may arrive at new use cases once some
of the basic tools exist.

- Wes

On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
> Thank you for sharing the videos. We are very interested in how to support
> Arrow data format and collection very closely, could you please help to
> point out which interfaces to allow Mnemonic act as a memory provider for
> the user to store and access Arrow managed datasets ? Thanks!
>
> Very truly yours,
> Gary.
>
>
> On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <iv...@gmail.com>
> wrote:
>
>> Great presentation! Thank you for sharing.
>>
>>
>> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com> wrote:
>>
>> > Absolutely. I will do that now
>> >
>> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org> wrote:
>> > > Thanks for sharing. Can we tweet those videos as well? I see that
>> > https://twitter.com/apachearrow <https://twitter.com/apachearrow> only
>> > tweeted your slides.
>> > >
>> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com>
>> wrote:
>> > >>
>> > >> hi all,
>> > >>
>> > >> In case folks here are interested, I gave a keynote this week at
>> > >> JupyterCon explaining my motivations for being involved in Apache
>> > >> Arrow and how I see it fitting in with the data science ecosystem long
>> > >> term:
>> > >>
>> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
>> > >>
>> > >> I also gave an interview going a little deeper into some of the topics
>> > >> from the talk:
>> > >>
>> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>> > >>
>> > >> I believe we have an exciting journey ahead of us, but it's certainly
>> > >> going to take a lot of collaboration and community development.
>> > >>
>> > >> - Wes
>> > >
>> >
>>

Re: Apache Arrow at JupyterCon

Posted by "Gang(Gary) Wang" <ga...@apache.org>.

Thank you for sharing the videos. We are very interested in how to support
Arrow data format and collection very closely, could you please help to
point out which interfaces to allow Mnemonic act as a memory provider for
the user to store and access Arrow managed datasets ? Thanks!

Very truly yours,
Gary.


On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <iv...@gmail.com>
wrote:

> Great presentation! Thank you for sharing.
>
>
> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com> wrote:
>
> > Absolutely. I will do that now
> >
> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org> wrote:
> > > Thanks for sharing. Can we tweet those videos as well? I see that
> > https://twitter.com/apachearrow <https://twitter.com/apachearrow> only
> > tweeted your slides.
> > >
> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com>
> wrote:
> > >>
> > >> hi all,
> > >>
> > >> In case folks here are interested, I gave a keynote this week at
> > >> JupyterCon explaining my motivations for being involved in Apache
> > >> Arrow and how I see it fitting in with the data science ecosystem long
> > >> term:
> > >>
> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
> > >>
> > >> I also gave an interview going a little deeper into some of the topics
> > >> from the talk:
> > >>
> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> > >>
> > >> I believe we have an exciting journey ahead of us, but it's certainly
> > >> going to take a lot of collaboration and community development.
> > >>
> > >> - Wes
> > >
> >
>

Re: Apache Arrow at JupyterCon

Posted by Ivan Sadikov <iv...@gmail.com>.

Great presentation! Thank you for sharing.


On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <we...@gmail.com> wrote:

> Absolutely. I will do that now
>
> On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org> wrote:
> > Thanks for sharing. Can we tweet those videos as well? I see that
> https://twitter.com/apachearrow <https://twitter.com/apachearrow> only
> tweeted your slides.
> >
> >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com> wrote:
> >>
> >> hi all,
> >>
> >> In case folks here are interested, I gave a keynote this week at
> >> JupyterCon explaining my motivations for being involved in Apache
> >> Arrow and how I see it fitting in with the data science ecosystem long
> >> term:
> >>
> >> https://www.youtube.com/watch?v=wdmf1msbtVs
> >>
> >> I also gave an interview going a little deeper into some of the topics
> >> from the talk:
> >>
> >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> >>
> >> I believe we have an exciting journey ahead of us, but it's certainly
> >> going to take a lot of collaboration and community development.
> >>
> >> - Wes
> >
>

Re: Apache Arrow at JupyterCon

Posted by Wes McKinney <we...@gmail.com>.

Absolutely. I will do that now

On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org> wrote:
> Thanks for sharing. Can we tweet those videos as well? I see that https://twitter.com/apachearrow <https://twitter.com/apachearrow> only tweeted your slides.
>
>> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com> wrote:
>>
>> hi all,
>>
>> In case folks here are interested, I gave a keynote this week at
>> JupyterCon explaining my motivations for being involved in Apache
>> Arrow and how I see it fitting in with the data science ecosystem long
>> term:
>>
>> https://www.youtube.com/watch?v=wdmf1msbtVs
>>
>> I also gave an interview going a little deeper into some of the topics
>> from the talk:
>>
>> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>>
>> I believe we have an exciting journey ahead of us, but it's certainly
>> going to take a lot of collaboration and community development.
>>
>> - Wes
>

Re: Apache Arrow at JupyterCon

Posted by Julian Hyde <jh...@apache.org>.

Thanks for sharing. Can we tweet those videos as well? I see that https://twitter.com/apachearrow <https://twitter.com/apachearrow> only tweeted your slides.

> On Aug 26, 2017, at 1:11 PM, Wes McKinney <we...@gmail.com> wrote:
> 
> hi all,
> 
> In case folks here are interested, I gave a keynote this week at
> JupyterCon explaining my motivations for being involved in Apache
> Arrow and how I see it fitting in with the data science ecosystem long
> term:
> 
> https://www.youtube.com/watch?v=wdmf1msbtVs
> 
> I also gave an interview going a little deeper into some of the topics
> from the talk:
> 
> https://www.youtube.com/watch?v=Q7y9l-L8yiU
> 
> I believe we have an exciting journey ahead of us, but it's certainly
> going to take a lot of collaboration and community development.
> 
> - Wes