You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Andrew Palumbo (via Google Docs)" <an...@gmail.com> on 2019/11/04 02:13:52 UTC

Proposal for a New Serialization, De-serialization and Memory Scheme for [Spark] distribution of Mahout AbstractVectors and Mahout Abstract Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices

I've shared an item with you:

Proposal for a New Serialization, De-serialization and  Memory Scheme for  
[Spark] distribution of Mahout AbstractVectors and Mahout Abstract  
Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices
https://docs.google.com/document/d/18RybVEpjqjDU_cCzwM6dtS3ZZkd1tDvCYlkhqzloS-4/edit?usp=sharing&ts=5dbf8960

It's not an attachment -- it's stored online. To open this item, just click  
the link above.

Re: Proposal for a New Serialization, De-serialization and Memory Scheme for [Spark] distribution of Mahout AbstractVectors and Mahout Abstract Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices

Posted by Apache Mahout <ma...@gmail.com>.
Yeah I remember know that spark uses unsafe. I haven't noticed how much
arrow and if arrow is just for py-spark .. but that is another option.

Do you agree this makes sense then, to look into?Trevor? I think there is
an easy optimization for one of the next releases. Maybe 14.3 or 14.4/15.0.
Maybe we can finding a GSocC student who would be interesting this. It's a
short topic that someone could focus on several different areas, and learn
alot in one summer.

I think the most commonly useful interview questions are about native
memory or jni or device memory. I feel you can find out alot in one
question.

On Fri, 8 Nov 2019 08:38:22 -0600, dev wrote:

I was at a Meetup last night that was talking about how Spark does this
natively in some cases with sun.misc.unsafe (which is being removed in Java
11) and how Flink does this with DirectByteBuffer (I think?) which has
numerous benefits (that's what the meetup talk was about).



On Sun, Nov 3, 2019 at 8:19 PM Andrew Palumbo (via Google Docs) <
andrewpalumbo2222@gmail.com> wrote:

I've shared an item with you:

Proposal for a New Serialization, De-serialization and Memory Scheme for
[Spark] distribution of Mahout AbstractVectors and Mahout Abstract
Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices

https://docs.google.com/document/d/18RybVEpjqjDU_cCzwM6dtS3ZZkd1tDvCYlkhqzloS-4/edit?usp=sharing&ts=5dbf8960

It's not an attachment -- it's stored online. To open this item, just click
the link above.

Re: Proposal for a New Serialization, De-serialization and Memory Scheme for [Spark] distribution of Mahout AbstractVectors and Mahout Abstract Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices

Posted by Apache Mahout <ma...@gmail.com>.
Fix mail app..

On Fri, 8 Nov 2019 08:38:22 -0600, dev wrote:

I was at a Meetup last night that was talking about how Spark does this
natively in some cases with sun.misc.unsafe (which is being removed in Java
11) and how Flink does this with DirectByteBuffer (I think?) which has
numerous benefits (that's what the meetup talk was about).



On Sun, Nov 3, 2019 at 8:19 PM Andrew Palumbo (via Google Docs) <
andrewpalumbo2222@gmail.com> wrote:

I've shared an item with you:

Proposal for a New Serialization, De-serialization and Memory Scheme for
[Spark] distribution of Mahout AbstractVectors and Mahout Abstract
Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices

https://docs.google.com/document/d/18RybVEpjqjDU_cCzwM6dtS3ZZkd1tDvCYlkhqzloS-4/edit?usp=sharing&ts=5dbf8960

It's not an attachment -- it's stored online. To open this item, just click
the link above.

Re: Proposal for a New Serialization, De-serialization and Memory Scheme for [Spark] distribution of Mahout AbstractVectors and Mahout Abstract Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices

Posted by Trevor Grant <tr...@gmail.com>.
I was at a Meetup last night that was talking about how Spark does this
natively in some cases with sun.misc.unsafe (which is being removed in Java
11) and how Flink does this with DirectByteBuffer (I think?) which has
numerous benefits (that's what the meetup talk was about).



On Sun, Nov 3, 2019 at 8:19 PM Andrew Palumbo (via Google Docs) <
andrewpalumbo2222@gmail.com> wrote:

> I've shared an item with you:
>
> Proposal for a New Serialization, De-serialization and  Memory Scheme for
> [Spark] distribution of Mahout AbstractVectors and Mahout Abstract
> Matrices: Direct to Main (off heap), JVM (on heap), GPU and other devices
>
> https://docs.google.com/document/d/18RybVEpjqjDU_cCzwM6dtS3ZZkd1tDvCYlkhqzloS-4/edit?usp=sharing&ts=5dbf8960
>
> It's not an attachment -- it's stored online. To open this item, just
> click
> the link above.
>