You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:14:24 UTC

[jira] [Resolved] (SPARK-19747) Consolidate code in ML aggregators

     [ https://issues.apache.org/jira/browse/SPARK-19747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-19747.
----------------------------------
    Resolution: Incomplete

> Consolidate code in ML aggregators
> ----------------------------------
>
>                 Key: SPARK-19747
>                 URL: https://issues.apache.org/jira/browse/SPARK-19747
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Seth Hendrickson
>            Priority: Minor
>              Labels: bulk-closed
>
> Many algorithms in Spark ML are posed as optimization of a differentiable loss function over a parameter vector. We implement these by having a loss function accumulate the gradient using an Aggregator class which has methods that amount to a {{seqOp}} and {{combOp}}. So, pretty much every algorithm that obeys this form implements a cost function class and an aggregator class, which are completely separate from one another but share probably 80% of the same code. 
> I think it is important to clean things like this up, and if we can do it properly it will make the code much more maintainable, readable, and bug free. It will also help reduce the overhead of future implementations.
> The design is of course open for discussion, but I think we should aim to:
> 1. Have all aggregators share parent classes, so that they only need to implement the {{add}} function. This is really the only difference in the current aggregators.
> 2. Have a single, generic cost function that is parameterized by the aggregator type. This reduces the many places we implement cost functions and greatly reduces the amount of duplicated code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org