You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/04 15:18:39 UTC

[jira] [Resolved] (SPARK-12620) Proposal of GPU exploitation for Spark

     [ https://issues.apache.org/jira/browse/SPARK-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-12620.
-------------------------------
    Resolution: Duplicate

[~kiszk] well, now you've just opened a duplicate. That's not helpful, since it just splinters the conversation.

Keeping one open would be OK if there were an actionable change here, but you're just describing external, experimental work, which does not sound like something that needs to be in Spark, not now.

> Proposal of GPU exploitation for Spark
> --------------------------------------
>
>                 Key: SPARK-12620
>                 URL: https://issues.apache.org/jira/browse/SPARK-12620
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Kazuaki Ishizaki
>
> I created a new JIRA entry to move from SPARK-3875
> Exploiting GPUs can allow us to shorten the execution time of a Spark job and to reduce the number of machines in a cluster. We are working to effectively and easily exploit GPUs on Spark at  [http://github.com/kiszk/spark-gpu]. Our project page is [http://kiszk.github.io/spark-gpu/]. A design document is [here|https://docs.google.com/document/d/1bo1hbQ7ikdUA9LYtYh6kU_TwjFK2ebkHsH66QlmbYP8/edit?usp=sharing]
> Our ideas for exploiting GPUs are
> # adding a new format for a partition in an RDD, which is a column-based structure in an array format, in addition to the current Iterator\[T\] format with Seq\[T\]
> # generating parallelized GPU native code to access data in the new format from a Spark application program by using an optimizer and code generator (this is similar to [Project Tungsten|https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html]) and pre-compiled library
> The motivation of idea 1 is to reduce the overhead of serializing/deserializing partition data for copy between CPU and GPU. The motivation of idea 2 is to avoid writing hardware-dependent code by application programmers. At first, we are working for idea A (For idea B, we need to write [CUDA|https://en.wikipedia.org/wiki/CUDA] code for now). 
> This prototype achieved [3.15x performance improvement|https://github.com/kiszk/spark-gpu/wiki/Benchmark] of logistic regression ([SparkGPULR|https://github.com/kiszk/spark-gpu/blob/dev/examples/src/main/scala/org/apache/spark/examples/SparkGPULR.scala]) in examples on a 16-thread IvyBridge box with an NVIDIA K40 GPU card over that with no GPU card
> You can download the pre-build binary for x86_64 and ppc64le from [here|https://github.com/kiszk/spark-gpu/wiki/Downloads]. You can run this on Amazon EC2 by [the procedure|https://github.com/kiszk/spark-gpu/wiki/How-to-run-%28local-or-AWS-EC2%29], too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org