You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Peter Liu <pe...@gmail.com> on 2018/10/18 15:07:20 UTC

Re: Spark In Memory Shuffle / 5403

I would be very interested in the initial question here:

is there a production level implementation for memory only shuffle and
configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
storage level) as mentioned in this ticket,
https://github.com/apache/spark/pull/5403 ?

It would be a quite practical and useful option/feature. not sure what is
the status of this ticket implementation?

Thanks!

Peter

On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com> wrote:

> Thanks..great info. Will try and let all know.
>
> Best
>
> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <on...@zoho.com>
> wrote:
>
>> create the ramdisk:
>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>
>> then point spark.local.dir to the ramdisk, which depends on your
>> deployment strategy, for me it was through SparkConf object before passing
>> it to SparkContext:
>> conf.set("spark.local.dir","/mnt/spark")
>>
>> To validate that spark is actually using your ramdisk (by default it uses
>> /tmp), ls the ramdisk after running some jobs and you should see spark
>> directories (with date on directory name) on your ramdisk
>>
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>
>> What are the steps to configure this? Thanks
>>
>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>> onmstester@zoho.com.invalid> wrote:
>>
>>
>> Hi,
>> I failed to config spark for in-memory shuffle so currently just
>> using linux memory mapped directory (tmpfs) as working directory of spark,
>> so everything is fast
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>>
>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Liu <pe...@gmail.com>.
Hi Peter,

Thanks for the additional information - this is really helpful (I
definitively got more than I was looking for :-)

Cheers,

Peter


On Fri, Oct 19, 2018 at 12:53 PM Peter Rudenko <pe...@gmail.com>
wrote:

> Hi Peter, we're using a part of Crail - it's core library, called disni (
> https://github.com/zrlio/disni/). We couldn't reproduce results from that
> blog post, any case Crail is more platformic approach (it comes with it's
> own file system), while SparkRdma is a pluggable approach - it's just a
> plugin, that you can enable/disable for a particular workload, you can use
> any hadoop vendor, etc.
>
> The best optimization for shuffle between local jvms could be using
> something like short circuit local read (
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html)
> to use unix socket for local communication or just directly read a part
> from other's jvm shuffle file. But yes, it's not available in spark out of
> box.
>
> Thanks,
> Peter Rudenko
>
> пт, 19 жовт. 2018 о 16:54 Peter Liu <pe...@gmail.com> пише:
>
>> Hi Peter,
>>
>> thank you for the reply and detailed information! Would this something
>> comparable with Crail? (
>> http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html)
>> I was more looking for something simple/quick making the shuffle between
>> the local jvms quicker (like the idea of using local ram disk) for my
>> simple use case.
>>
>> of course, a general and thorough implementation should cover the shuffle
>> between the nodes as major focus. hmm, looks like there is no
>> implementation within spark itself yet.
>>
>> very much appreciated!
>>
>> Peter
>>
>> On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <pe...@gmail.com>
>> wrote:
>>
>>> Hey Peter, in SparkRDMA shuffle plugin (
>>> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle
>>> file, to do Remote Direct Memory Access. If the shuffle data is bigger then
>>> RAM, Mellanox NIC support On Demand Paging, where OS invalidates
>>> translations which are no longer valid due to either non-present pages or
>>> mapping changes. So if you have an RDMA capable NIC (or you can try on
>>> Azure cloud
>>> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
>>>  ), have a try. For network intensive apps you should get better
>>> performance.
>>>
>>> Thanks,
>>> Peter Rudenko
>>>
>>> чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:
>>>
>>>> I would be very interested in the initial question here:
>>>>
>>>> is there a production level implementation for memory only shuffle and
>>>> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
>>>> storage level) as mentioned in this ticket,
>>>> https://github.com/apache/spark/pull/5403 ?
>>>>
>>>> It would be a quite practical and useful option/feature. not sure what
>>>> is the status of this ticket implementation?
>>>>
>>>> Thanks!
>>>>
>>>> Peter
>>>>
>>>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks..great info. Will try and let all know.
>>>>>
>>>>> Best
>>>>>
>>>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <
>>>>> onmstester@zoho.com> wrote:
>>>>>
>>>>>> create the ramdisk:
>>>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>>>>
>>>>>> then point spark.local.dir to the ramdisk, which depends on your
>>>>>> deployment strategy, for me it was through SparkConf object before passing
>>>>>> it to SparkContext:
>>>>>> conf.set("spark.local.dir","/mnt/spark")
>>>>>>
>>>>>> To validate that spark is actually using your ramdisk (by default it
>>>>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>>>>> directories (with date on directory name) on your ramdisk
>>>>>>
>>>>>>
>>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>>
>>>>>>
>>>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>>>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>>>>
>>>>>> What are the steps to configure this? Thanks
>>>>>>
>>>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>>>>> onmstester@zoho.com.invalid> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> I failed to config spark for in-memory shuffle so currently just
>>>>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>>>>> so everything is fast
>>>>>>
>>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Liu <pe...@gmail.com>.
Hi Peter,

Thanks for the additional information - this is really helpful (I
definitively got more than I was looking for :-)

Cheers,

Peter


On Fri, Oct 19, 2018 at 12:53 PM Peter Rudenko <pe...@gmail.com>
wrote:

> Hi Peter, we're using a part of Crail - it's core library, called disni (
> https://github.com/zrlio/disni/). We couldn't reproduce results from that
> blog post, any case Crail is more platformic approach (it comes with it's
> own file system), while SparkRdma is a pluggable approach - it's just a
> plugin, that you can enable/disable for a particular workload, you can use
> any hadoop vendor, etc.
>
> The best optimization for shuffle between local jvms could be using
> something like short circuit local read (
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html)
> to use unix socket for local communication or just directly read a part
> from other's jvm shuffle file. But yes, it's not available in spark out of
> box.
>
> Thanks,
> Peter Rudenko
>
> пт, 19 жовт. 2018 о 16:54 Peter Liu <pe...@gmail.com> пише:
>
>> Hi Peter,
>>
>> thank you for the reply and detailed information! Would this something
>> comparable with Crail? (
>> http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html)
>> I was more looking for something simple/quick making the shuffle between
>> the local jvms quicker (like the idea of using local ram disk) for my
>> simple use case.
>>
>> of course, a general and thorough implementation should cover the shuffle
>> between the nodes as major focus. hmm, looks like there is no
>> implementation within spark itself yet.
>>
>> very much appreciated!
>>
>> Peter
>>
>> On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <pe...@gmail.com>
>> wrote:
>>
>>> Hey Peter, in SparkRDMA shuffle plugin (
>>> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle
>>> file, to do Remote Direct Memory Access. If the shuffle data is bigger then
>>> RAM, Mellanox NIC support On Demand Paging, where OS invalidates
>>> translations which are no longer valid due to either non-present pages or
>>> mapping changes. So if you have an RDMA capable NIC (or you can try on
>>> Azure cloud
>>> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
>>>  ), have a try. For network intensive apps you should get better
>>> performance.
>>>
>>> Thanks,
>>> Peter Rudenko
>>>
>>> чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:
>>>
>>>> I would be very interested in the initial question here:
>>>>
>>>> is there a production level implementation for memory only shuffle and
>>>> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
>>>> storage level) as mentioned in this ticket,
>>>> https://github.com/apache/spark/pull/5403 ?
>>>>
>>>> It would be a quite practical and useful option/feature. not sure what
>>>> is the status of this ticket implementation?
>>>>
>>>> Thanks!
>>>>
>>>> Peter
>>>>
>>>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks..great info. Will try and let all know.
>>>>>
>>>>> Best
>>>>>
>>>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <
>>>>> onmstester@zoho.com> wrote:
>>>>>
>>>>>> create the ramdisk:
>>>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>>>>
>>>>>> then point spark.local.dir to the ramdisk, which depends on your
>>>>>> deployment strategy, for me it was through SparkConf object before passing
>>>>>> it to SparkContext:
>>>>>> conf.set("spark.local.dir","/mnt/spark")
>>>>>>
>>>>>> To validate that spark is actually using your ramdisk (by default it
>>>>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>>>>> directories (with date on directory name) on your ramdisk
>>>>>>
>>>>>>
>>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>>
>>>>>>
>>>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>>>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>>>>
>>>>>> What are the steps to configure this? Thanks
>>>>>>
>>>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>>>>> onmstester@zoho.com.invalid> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> I failed to config spark for in-memory shuffle so currently just
>>>>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>>>>> so everything is fast
>>>>>>
>>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Rudenko <pe...@gmail.com>.
Hi Peter, we're using a part of Crail - it's core library, called disni (
https://github.com/zrlio/disni/). We couldn't reproduce results from that
blog post, any case Crail is more platformic approach (it comes with it's
own file system), while SparkRdma is a pluggable approach - it's just a
plugin, that you can enable/disable for a particular workload, you can use
any hadoop vendor, etc.

The best optimization for shuffle between local jvms could be using
something like short circuit local read (
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html)
to use unix socket for local communication or just directly read a part
from other's jvm shuffle file. But yes, it's not available in spark out of
box.

Thanks,
Peter Rudenko

пт, 19 жовт. 2018 о 16:54 Peter Liu <pe...@gmail.com> пише:

> Hi Peter,
>
> thank you for the reply and detailed information! Would this something
> comparable with Crail? (
> http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html)
> I was more looking for something simple/quick making the shuffle between
> the local jvms quicker (like the idea of using local ram disk) for my
> simple use case.
>
> of course, a general and thorough implementation should cover the shuffle
> between the nodes as major focus. hmm, looks like there is no
> implementation within spark itself yet.
>
> very much appreciated!
>
> Peter
>
> On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <pe...@gmail.com>
> wrote:
>
>> Hey Peter, in SparkRDMA shuffle plugin (
>> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file,
>> to do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
>> Mellanox NIC support On Demand Paging, where OS invalidates translations
>> which are no longer valid due to either non-present pages or mapping
>> changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
>>
>> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
>>  ), have a try. For network intensive apps you should get better
>> performance.
>>
>> Thanks,
>> Peter Rudenko
>>
>> чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:
>>
>>> I would be very interested in the initial question here:
>>>
>>> is there a production level implementation for memory only shuffle and
>>> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
>>> storage level) as mentioned in this ticket,
>>> https://github.com/apache/spark/pull/5403 ?
>>>
>>> It would be a quite practical and useful option/feature. not sure what
>>> is the status of this ticket implementation?
>>>
>>> Thanks!
>>>
>>> Peter
>>>
>>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
>>> wrote:
>>>
>>>> Thanks..great info. Will try and let all know.
>>>>
>>>> Best
>>>>
>>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <
>>>> onmstester@zoho.com> wrote:
>>>>
>>>>> create the ramdisk:
>>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>>>
>>>>> then point spark.local.dir to the ramdisk, which depends on your
>>>>> deployment strategy, for me it was through SparkConf object before passing
>>>>> it to SparkContext:
>>>>> conf.set("spark.local.dir","/mnt/spark")
>>>>>
>>>>> To validate that spark is actually using your ramdisk (by default it
>>>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>>>> directories (with date on directory name) on your ramdisk
>>>>>
>>>>>
>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>
>>>>>
>>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>>>
>>>>> What are the steps to configure this? Thanks
>>>>>
>>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>>>> onmstester@zoho.com.invalid> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>> I failed to config spark for in-memory shuffle so currently just
>>>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>>>> so everything is fast
>>>>>
>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>
>>>>>
>>>>>
>>>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Rudenko <pe...@gmail.com>.
Hi Peter, we're using a part of Crail - it's core library, called disni (
https://github.com/zrlio/disni/). We couldn't reproduce results from that
blog post, any case Crail is more platformic approach (it comes with it's
own file system), while SparkRdma is a pluggable approach - it's just a
plugin, that you can enable/disable for a particular workload, you can use
any hadoop vendor, etc.

The best optimization for shuffle between local jvms could be using
something like short circuit local read (
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html)
to use unix socket for local communication or just directly read a part
from other's jvm shuffle file. But yes, it's not available in spark out of
box.

Thanks,
Peter Rudenko

пт, 19 жовт. 2018 о 16:54 Peter Liu <pe...@gmail.com> пише:

> Hi Peter,
>
> thank you for the reply and detailed information! Would this something
> comparable with Crail? (
> http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html)
> I was more looking for something simple/quick making the shuffle between
> the local jvms quicker (like the idea of using local ram disk) for my
> simple use case.
>
> of course, a general and thorough implementation should cover the shuffle
> between the nodes as major focus. hmm, looks like there is no
> implementation within spark itself yet.
>
> very much appreciated!
>
> Peter
>
> On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <pe...@gmail.com>
> wrote:
>
>> Hey Peter, in SparkRDMA shuffle plugin (
>> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file,
>> to do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
>> Mellanox NIC support On Demand Paging, where OS invalidates translations
>> which are no longer valid due to either non-present pages or mapping
>> changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
>>
>> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
>>  ), have a try. For network intensive apps you should get better
>> performance.
>>
>> Thanks,
>> Peter Rudenko
>>
>> чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:
>>
>>> I would be very interested in the initial question here:
>>>
>>> is there a production level implementation for memory only shuffle and
>>> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
>>> storage level) as mentioned in this ticket,
>>> https://github.com/apache/spark/pull/5403 ?
>>>
>>> It would be a quite practical and useful option/feature. not sure what
>>> is the status of this ticket implementation?
>>>
>>> Thanks!
>>>
>>> Peter
>>>
>>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
>>> wrote:
>>>
>>>> Thanks..great info. Will try and let all know.
>>>>
>>>> Best
>>>>
>>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <
>>>> onmstester@zoho.com> wrote:
>>>>
>>>>> create the ramdisk:
>>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>>>
>>>>> then point spark.local.dir to the ramdisk, which depends on your
>>>>> deployment strategy, for me it was through SparkConf object before passing
>>>>> it to SparkContext:
>>>>> conf.set("spark.local.dir","/mnt/spark")
>>>>>
>>>>> To validate that spark is actually using your ramdisk (by default it
>>>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>>>> directories (with date on directory name) on your ramdisk
>>>>>
>>>>>
>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>
>>>>>
>>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>>>
>>>>> What are the steps to configure this? Thanks
>>>>>
>>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>>>> onmstester@zoho.com.invalid> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>> I failed to config spark for in-memory shuffle so currently just
>>>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>>>> so everything is fast
>>>>>
>>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>>
>>>>>
>>>>>
>>>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Liu <pe...@gmail.com>.
 Hi Peter,

thank you for the reply and detailed information! Would this something
comparable with Crail? (
http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html)
I was more looking for something simple/quick making the shuffle between
the local jvms quicker (like the idea of using local ram disk) for my
simple use case.

of course, a general and thorough implementation should cover the shuffle
between the nodes as major focus. hmm, looks like there is no
implementation within spark itself yet.

very much appreciated!

Peter

On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <pe...@gmail.com>
wrote:

> Hey Peter, in SparkRDMA shuffle plugin (
> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file,
> to do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
> Mellanox NIC support On Demand Paging, where OS invalidates translations
> which are no longer valid due to either non-present pages or mapping
> changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
>
> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
>  ), have a try. For network intensive apps you should get better
> performance.
>
> Thanks,
> Peter Rudenko
>
> чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:
>
>> I would be very interested in the initial question here:
>>
>> is there a production level implementation for memory only shuffle and
>> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
>> storage level) as mentioned in this ticket,
>> https://github.com/apache/spark/pull/5403 ?
>>
>> It would be a quite practical and useful option/feature. not sure what is
>> the status of this ticket implementation?
>>
>> Thanks!
>>
>> Peter
>>
>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
>> wrote:
>>
>>> Thanks..great info. Will try and let all know.
>>>
>>> Best
>>>
>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <on...@zoho.com>
>>> wrote:
>>>
>>>> create the ramdisk:
>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>>
>>>> then point spark.local.dir to the ramdisk, which depends on your
>>>> deployment strategy, for me it was through SparkConf object before passing
>>>> it to SparkContext:
>>>> conf.set("spark.local.dir","/mnt/spark")
>>>>
>>>> To validate that spark is actually using your ramdisk (by default it
>>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>>> directories (with date on directory name) on your ramdisk
>>>>
>>>>
>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>
>>>>
>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>>
>>>> What are the steps to configure this? Thanks
>>>>
>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>>> onmstester@zoho.com.invalid> wrote:
>>>>
>>>>
>>>> Hi,
>>>> I failed to config spark for in-memory shuffle so currently just
>>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>>> so everything is fast
>>>>
>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>
>>>>
>>>>
>>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Liu <pe...@gmail.com>.
 Hi Peter,

thank you for the reply and detailed information! Would this something
comparable with Crail? (
http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html)
I was more looking for something simple/quick making the shuffle between
the local jvms quicker (like the idea of using local ram disk) for my
simple use case.

of course, a general and thorough implementation should cover the shuffle
between the nodes as major focus. hmm, looks like there is no
implementation within spark itself yet.

very much appreciated!

Peter

On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <pe...@gmail.com>
wrote:

> Hey Peter, in SparkRDMA shuffle plugin (
> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file,
> to do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
> Mellanox NIC support On Demand Paging, where OS invalidates translations
> which are no longer valid due to either non-present pages or mapping
> changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
>
> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
>  ), have a try. For network intensive apps you should get better
> performance.
>
> Thanks,
> Peter Rudenko
>
> чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:
>
>> I would be very interested in the initial question here:
>>
>> is there a production level implementation for memory only shuffle and
>> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
>> storage level) as mentioned in this ticket,
>> https://github.com/apache/spark/pull/5403 ?
>>
>> It would be a quite practical and useful option/feature. not sure what is
>> the status of this ticket implementation?
>>
>> Thanks!
>>
>> Peter
>>
>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
>> wrote:
>>
>>> Thanks..great info. Will try and let all know.
>>>
>>> Best
>>>
>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <on...@zoho.com>
>>> wrote:
>>>
>>>> create the ramdisk:
>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>>
>>>> then point spark.local.dir to the ramdisk, which depends on your
>>>> deployment strategy, for me it was through SparkConf object before passing
>>>> it to SparkContext:
>>>> conf.set("spark.local.dir","/mnt/spark")
>>>>
>>>> To validate that spark is actually using your ramdisk (by default it
>>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>>> directories (with date on directory name) on your ramdisk
>>>>
>>>>
>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>
>>>>
>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>>
>>>> What are the steps to configure this? Thanks
>>>>
>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>>> onmstester@zoho.com.invalid> wrote:
>>>>
>>>>
>>>> Hi,
>>>> I failed to config spark for in-memory shuffle so currently just
>>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>>> so everything is fast
>>>>
>>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>>
>>>>
>>>>
>>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Rudenko <pe...@gmail.com>.
Hey Peter, in SparkRDMA shuffle plugin (
https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file, to
do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
Mellanox NIC support On Demand Paging, where OS invalidates translations
which are no longer valid due to either non-present pages or mapping
changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
 ), have a try. For network intensive apps you should get better
performance.

Thanks,
Peter Rudenko

чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:

> I would be very interested in the initial question here:
>
> is there a production level implementation for memory only shuffle and
> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
> storage level) as mentioned in this ticket,
> https://github.com/apache/spark/pull/5403 ?
>
> It would be a quite practical and useful option/feature. not sure what is
> the status of this ticket implementation?
>
> Thanks!
>
> Peter
>
> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
> wrote:
>
>> Thanks..great info. Will try and let all know.
>>
>> Best
>>
>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <on...@zoho.com>
>> wrote:
>>
>>> create the ramdisk:
>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>
>>> then point spark.local.dir to the ramdisk, which depends on your
>>> deployment strategy, for me it was through SparkConf object before passing
>>> it to SparkContext:
>>> conf.set("spark.local.dir","/mnt/spark")
>>>
>>> To validate that spark is actually using your ramdisk (by default it
>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>> directories (with date on directory name) on your ramdisk
>>>
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>
>>> What are the steps to configure this? Thanks
>>>
>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>> onmstester@zoho.com.invalid> wrote:
>>>
>>>
>>> Hi,
>>> I failed to config spark for in-memory shuffle so currently just
>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>> so everything is fast
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>>
>>>

Re: Spark In Memory Shuffle / 5403

Posted by Peter Rudenko <pe...@gmail.com>.
Hey Peter, in SparkRDMA shuffle plugin (
https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file, to
do Remote Direct Memory Access. If the shuffle data is bigger then RAM,
Mellanox NIC support On Demand Paging, where OS invalidates translations
which are no longer valid due to either non-present pages or mapping
changes. So if you have an RDMA capable NIC (or you can try on Azure cloud
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/
 ), have a try. For network intensive apps you should get better
performance.

Thanks,
Peter Rudenko

чт, 18 жовт. 2018 о 18:07 Peter Liu <pe...@gmail.com> пише:

> I would be very interested in the initial question here:
>
> is there a production level implementation for memory only shuffle and
> configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
> storage level) as mentioned in this ticket,
> https://github.com/apache/spark/pull/5403 ?
>
> It would be a quite practical and useful option/feature. not sure what is
> the status of this ticket implementation?
>
> Thanks!
>
> Peter
>
> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ra...@gmail.com>
> wrote:
>
>> Thanks..great info. Will try and let all know.
>>
>> Best
>>
>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <on...@zoho.com>
>> wrote:
>>
>>> create the ramdisk:
>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>>
>>> then point spark.local.dir to the ramdisk, which depends on your
>>> deployment strategy, for me it was through SparkConf object before passing
>>> it to SparkContext:
>>> conf.set("spark.local.dir","/mnt/spark")
>>>
>>> To validate that spark is actually using your ramdisk (by default it
>>> uses /tmp), ls the ramdisk after running some jobs and you should see spark
>>> directories (with date on directory name) on your ramdisk
>>>
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>>> <ravishankar.nair@gmail.com <ra...@gmail.com>>* wrote ----
>>>
>>> What are the steps to configure this? Thanks
>>>
>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>>> onmstester@zoho.com.invalid> wrote:
>>>
>>>
>>> Hi,
>>> I failed to config spark for in-memory shuffle so currently just
>>> using linux memory mapped directory (tmpfs) as working directory of spark,
>>> so everything is fast
>>>
>>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>>
>>>
>>>
>>>