You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2015/06/08 23:50:01 UTC

[jira] [Commented] (MAHOUT-1570) Adding support for Apache Flink as a backend for the Mahout DSL

    [ https://issues.apache.org/jira/browse/MAHOUT-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577908#comment-14577908 ] 

Dmitriy Lyubimov commented on MAHOUT-1570:
------------------------------------------

I think the code is very nice and streamlines some of by now convoluted history of general algebraic operators we did for Spark. 

There are few comments there i have, e.g. AewScalar (i.e. A + 5 kind of stuff) is now deprecated in favor of more generic AUnaryFunc which is part of #135. 

#135 is going in this week, so it would be nice to do a few adaptations to some changes on logical level and make a PR. without PR it is hard to make suggestions. 

If there are any Flink-specific instructions about how to try this out (apart from unit tests which i suppose run actual Flink in local mode), it would be very welcome to have that added to Mahout website. If there's more than one page, we can add a series of menus there.

Thank you for doing this, this is great.

> Adding support for Apache Flink as a backend for the Mahout DSL
> ---------------------------------------------------------------
>
>                 Key: MAHOUT-1570
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1570
>             Project: Mahout
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Suneel Marthi
>              Labels: DSL, flink, scala
>
> With the finalized abstraction of the Mahout DSL plans from the backend operations (MAHOUT-1529), it should be possible to integrate further backends for the Mahout DSL. Apache Flink would be a suitable candidate to act as a good execution backend. 
> With respect to the implementation, the biggest difference between Spark and Flink at the moment is probably the incremental rollout of plans, which is triggered by Spark's actions and which is not supported by Flink yet. However, the Flink community is working on this issue. For the moment, it should be possible to circumvent this problem by writing intermediate results required by an action to HDFS and reading from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)