You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Felix Cheung (JIRA)" <ji...@apache.org> on 2016/12/14 21:38:00 UTC
[jira] [Comment Edited] (SPARK-18862) Split SparkR mllib.R into multiple files

    [ https://issues.apache.org/jira/browse/SPARK-18862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749559#comment-15749559 ] 

Felix Cheung edited comment on SPARK-18862 at 12/14/16 9:37 PM:
----------------------------------------------------------------

AFAIK, R package has a constrain that it has to be a flat structure, so I don't think subdirectory would work. (search for "directory" in http://r-pkgs.had.co.nz/r.html)

My preference would be ml- or ml_
I think we should call it ml instead of mllib to match spark.ml.

Also perhaps it make sense to group by algorithm in some cases (eg. random forest, GBT) instead of breaking it into classification and regression since they are so similar.



was (Author: felixcheung):
AFAIK, R package has a constrain that it has to be a flat structure, so I don't think subdirectory would work. (http://r-pkgs.had.co.nz/r.html)

My preference would be ml- or ml_
I think we should call it ml instead of mllib to match spark.ml.

Also perhaps it make sense to group by algorithm in some cases (eg. random forest, GBT) instead of breaking it into classification and regression since they are so similar.


> Split SparkR mllib.R into multiple files
> ----------------------------------------
>
>                 Key: SPARK-18862
>                 URL: https://issues.apache.org/jira/browse/SPARK-18862
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, SparkR
>            Reporter: Yanbo Liang
>
> SparkR mllib.R is getting bigger as we add more ML wrappers, I'd like to split it into multiple files to make us easy to maintain:
> * mllibClassification.R
> * mllibRegression.R
> * mllibClustering.R
> * mllibFeature.R
> or:
> * mllib/classification.R
> * mllib/regression.R
> * mllib/clustering.R
> * mllib/features.R
> For R convention, it's more prefer the first way. And I'm not sure whether R supports the second organized way (will check later). Please let me know your preference. I think the start of a new release cycle is a good opportunity to do this, since it will involves less conflicts. If this proposal was approved, I can work on it.
> cc [~felixcheung] [~josephkb] [~mengxr] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org