You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yuhao yang (JIRA)" <ji...@apache.org> on 2016/09/07 18:09:20 UTC

[jira] [Updated] (SPARK-17094) provide simplified API for ML pipeline

     [ https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

yuhao yang updated SPARK-17094:
-------------------------------
    Description: 
Many machine learning pipeline has the API for easily assembling transformers.

One example would be:
{code}
val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
{code}
Overall, the feature would 
1. Allow people (especially starters) to create a ML application in one simple line of code. 
2. And can be handy for users as they don't have to set the input, output columns.
3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration:
{code}
"ml.pipeline": "tokenizer", "hashingTF", "lda"
"ml.tokenizer.toLowercase": "false"
...
{code}, which can be quite efficient for tuning on cluster.

Appreciate feedback and suggestions.

  was:
Many machine learning pipeline has the API for easily assembling transformers.

One example would be:
val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).

Overall, the feature would 
1. Allow people (especially starters) to create a ML application in one simple line of code. 
2. And can be handy for users as they don't have to set the input, output columns.
3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration:
{code}
"ml.pipeline": "tokenizer", "hashingTF", "lda"
"ml.tokenizer.toLowercase": "false"
...
{code}, which can be quite efficient for tuning on cluster.

Appreciate feedback and suggestions.


> provide simplified API for ML pipeline
> --------------------------------------
>
>                 Key: SPARK-17094
>                 URL: https://issues.apache.org/jira/browse/SPARK-17094
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: yuhao yang
>
> Many machine learning pipeline has the API for easily assembling transformers.
> One example would be:
> {code}
> val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
> {code}
> Overall, the feature would 
> 1. Allow people (especially starters) to create a ML application in one simple line of code. 
> 2. And can be handy for users as they don't have to set the input, output columns.
> 3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration:
> {code}
> "ml.pipeline": "tokenizer", "hashingTF", "lda"
> "ml.tokenizer.toLowercase": "false"
> ...
> {code}, which can be quite efficient for tuning on cluster.
> Appreciate feedback and suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org