You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wojciech Szymanski (JIRA)" <ji...@apache.org> on 2016/11/01 23:19:58 UTC

[jira] [Created] (SPARK-18213) Syntactic sugar over Pipeline API

Wojciech Szymanski created SPARK-18213:
------------------------------------------

             Summary: Syntactic sugar over Pipeline API
                 Key: SPARK-18213
                 URL: https://issues.apache.org/jira/browse/SPARK-18213
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.0.1
            Reporter: Wojciech Szymanski
            Priority: Minor


Currently, creating ML Pipeline is based on very verbose setStages method as below:
{code}
    val tokenizer = new RegexTokenizer()
    val stopWordsRemover = new StopWordsRemover()
    val countVectorizer = new CountVectorizer()

    val pipeline = new Pipeline().setStages(Array(tokenizer, stopWordsRemover, countVectorizer))
{code}

What about a bit of syntactic sugar over Pipeline API?
{code}
    val tokenizer = new RegexTokenizer()
    val stopWordsRemover = new StopWordsRemover()
    val countVectorizer = new CountVectorizer()

    val pipeline = tokenizer + stopWordsRemover + countVectorizer
{code}

Production code changes in mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala:
https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-5226e84dea43423760dc6300ddafb01b

Scala example:
https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-798e85dd9107565fabab1126f57e3d6e

Java example:
https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-69ac857220f21b5e1684444d80d6dffe

Thanks in advance for your feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org