You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Trevor Grant (JIRA)" <ji...@apache.org> on 2017/02/10 17:30:41 UTC
[jira] [Commented] (FLINK-5782) Support GPU calculations

    [ https://issues.apache.org/jira/browse/FLINK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861563#comment-15861563 ] 

Trevor Grant commented on FLINK-5782:
-------------------------------------

2) -1 to removing sparse array support.  You don't want to be serializing tdf matrices / vectors as dense. I think you would be better served to add sparse vectors to ND4J or provide a converter. 
4) Bigger question- are you wanting to enable this in the streaming or batch?  The persist method is only necessary for batch.  Streaming is a whole other bag of worms. 
5) To be clear- IF you are considering batch only, and FLINK-1730 is addressed, then this issue is resolved as of Mahout 0.13.0 via MAHOUT-1885 https://issues.apache.org/jira/browse/MAHOUT-1885

In general is batch ML still of interest?  Have you asked on the ND4J list why they moved away from Flink support?



> Support GPU calculations
> ------------------------
>
>                 Key: FLINK-5782
>                 URL: https://issues.apache.org/jira/browse/FLINK-5782
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.3.0
>            Reporter: Kate Eri
>            Priority: Minor
>
> This ticket was initiated as continuation of the dev discussion thread: [New Flink team member - Kate Eri (Integration with DL4J topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]  
> Recently we have proposed the idea to integrate [Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. 
> It is known that DL models training is resource demanding process, so training on CPU could converge much longer than on GPU.  
> But not only for DL training GPU usage could be supposed, but also for optimization of graph analytics and other typical data manipulations, nice overview of GPU related problems is presented [Accelerating Spark workloads using GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].
> Currently the community pointed the following issues to consider:
> 1)	Flink would like to avoid to write one more time its own GPU support, to reduce engineering burden. That’s why such libraries like [ND4J|http://nd4j.org/userguide]  should be considered. 
> 2)	Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to optimize linear algebra calculations, ND4J can’t be integrated as is, because it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe this issue should be simply closed to enable ND4J usage?
> 3)	The calculations would have to work with both available and not available GPUs. If the system detects that GPUs are available, then ideally it would exploit them. Thus GPU resource management could be incorporated in [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested).
> 4)	It was mentioned that as far Flink takes care of shipping data around the cluster, also it will perform its dump out to GPU for calculation and load back up. In practice, the lack of a persist method for intermediate results makes this troublesome (not because of GPUs but for calculating any sort of complex algorithm we expect to be able to cache intermediate results).
> That’s why the Ticket [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be implemented to solve such problem.  
> 5)	Also it was recommended to take a look at Apache Mahout, at least to get the experience with  GPU integration and check its
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl 
> 6)	Also experience of Netflix regarding this question could be considered: [Distributed Neural Networks with GPUs in the AWS Cloud|http://techblog.netflix.com/search/label/CUDA]   
> This is considered as master ticket for GPU related ticktes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)