You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/10/28 10:20:58 UTC

[jira] [Commented] (SPARK-14567) Add instrumentation logs to MLlib training algorithms

    [ https://issues.apache.org/jira/browse/SPARK-14567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615017#comment-15615017 ] 

Apache Spark commented on SPARK-14567:
--------------------------------------

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/15671

> Add instrumentation logs to MLlib training algorithms
> -----------------------------------------------------
>
>                 Key: SPARK-14567
>                 URL: https://issues.apache.org/jira/browse/SPARK-14567
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>            Reporter: Timothy Hunter
>            Assignee: Timothy Hunter
>
> In order to debug performance issues when training mllib algorithms,
> it is useful to log some metrics about the training dataset, the training parameters, etc.
> This ticket is an umbrella to add some simple logging messages to the most common MLlib estimators. There should be no performance impact on the current implementation, and the output is simply printed in the logs.
> Here are some values that are of interest when debugging training tasks:
> * number of features
> * number of instances
> * number of partitions
> * number of classes
> * input RDD/DF cache level
> * hyper-parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org