You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by keypointt <gi...@git.apache.org> on 2016/02/09 23:38:27 UTC

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

GitHub user keypointt opened a pull request:

    https://github.com/apache/spark/pull/11142

    [SPARK-13017][Docs] Replace example code in mllib-feature-extraction.md using include_example

    Replace example code in mllib-feature-extraction.md using include_example
    https://issues.apache.org/jira/browse/SPARK-13017

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/keypointt/spark SPARK-13017

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11142
    
----
commit 1159d02a1ea773680a61de23c8b01acb3d23f1fb
Author: Xin Ren <ia...@126.com>
Date:   2016-02-09T08:21:10Z

    [SPARK-13017] all can compile, and style check passed

commit 19ad929f72a030fb0ce3ae1794c416cc1ce05e7e
Author: Xin Ren <ia...@126.com>
Date:   2016-02-09T22:36:27Z

    [SPARK-13017] remove empty lines

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183145912
  
    **[Test build #51164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51164/consoleFull)** for PR 11142 at commit [`6c3122a`](https://github.com/apache/spark/commit/6c3122a91bc637446ff8ba8cfce53e12a1718e58).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560850
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/NormalizerExample.scala ---
    @@ -0,0 +1,51 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.Normalizer
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object NormalizerExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("NormalizerExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +
    +    val normalizer1 = new Normalizer()
    +    val normalizer2 = new Normalizer(p = Double.PositiveInfinity)
    +
    +    // Each sample in data1 will be normalized using $L^2$ norm.
    +    val data1 = data.map(x => (x.label, normalizer1.transform(x.features)))
    +
    +    // Each sample in data2 will be normalized using $L^\infty$ norm.
    +    val data2 = data.map(x => (x.label, normalizer2.transform(x.features)))
    +    // $example off$
    +
    --- End diff --
    
    add outputs of data1 and data2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560916
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/StandardScalerExample.scala ---
    @@ -0,0 +1,56 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{StandardScaler, StandardScalerModel}
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object StandardScalerExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    def main(args: Array[String]): Unit = {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-197375803
  
    cc @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559416
  
    --- Diff: examples/src/main/python/mllib/standard_scaler_example.py ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.feature import StandardScaler
    +from pyspark.mllib.feature import StandardScalerModel
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="StandardScalerExample")  # SparkContext
    +
    +    # $example on$
    +    data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    label = data.map(lambda x: x.label)
    +    features = data.map(lambda x: x.features)
    +
    +    scaler1 = StandardScaler().fit(features)
    +    scaler2 = StandardScaler(withMean=True, withStd=True).fit(features)
    +    # scaler3 is an identical model to scaler2, and will produce identical transformations
    +    scaler3 = StandardScalerModel(scaler2.std, scaler2.mean)
    --- End diff --
    
    delete this line, since we cannot create a `StandardScalerModel` instance by hand. Here is a mistake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559407
  
    --- Diff: examples/src/main/python/mllib/standard_scaler_example.py ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.feature import StandardScaler
    +from pyspark.mllib.feature import StandardScalerModel
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="StandardScalerExample")  # SparkContext
    +
    +    # $example on$
    +    data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    label = data.map(lambda x: x.label)
    +    features = data.map(lambda x: x.features)
    +
    +    scaler1 = StandardScaler().fit(features)
    +    scaler2 = StandardScaler(withMean=True, withStd=True).fit(features)
    +    # scaler3 is an identical model to scaler2, and will produce identical transformations
    +    scaler3 = StandardScalerModel(scaler2.std, scaler2.mean)
    +
    +    # data1 will be unit variance.
    +    data1 = label.zip(scaler1.transform(features))
    +
    +    # Without converting the features into dense vectors, transformation with zero mean will raise
    +    # exception on sparse vector.
    +    # data2 will be unit variance and zero mean.
    +    data2 = label.zip(scaler1.transform(features.map(lambda x: Vectors.dense(x.toArray()))))
    --- End diff --
    
    change the `scaler1` to `scaler2`, previous example makes a mistake.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-184572259
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11142


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-191921057
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560735
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/Word2VecExample.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
    +// $example off$
    +
    +object Word2VecExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("Word2VecExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    val input = sc.textFile("text8").map(line => line.split(" ").toSeq)
    --- End diff --
    
    change the path to `"data/mllib/sample_lda_data.txt"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183149328
  
    **[Test build #51164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51164/consoleFull)** for PR 11142 at commit [`6c3122a`](https://github.com/apache/spark/commit/6c3122a91bc637446ff8ba8cfce53e12a1718e58).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560874
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PCAExample.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.PCA
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.regression.{LabeledPoint, LinearRegressionWithSGD}
    +// $example off$
    +
    +object PCAExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("PCAExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    val data = sc.textFile("data/mllib/ridge-data/lpsa.data").map { line =>
    +      val parts = line.split(',')
    +      LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
    +    }.cache()
    +
    +    val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
    +    val training = splits(0).cache()
    +    val test = splits(1)
    +
    +    val pca = new PCA(training.first().features.size/2).fit(data.map(_.features))
    --- End diff --
    
    `training.first().features.size/2` to `training.first().features.size / 2`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560947
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    --- End diff --
    
    Change the imports block to:
    
    ```scala
    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    // $example on$
    import org.apache.spark.mllib.feature.{HashingTF, IDF}
    import org.apache.spark.mllib.linalg.Vector
    import org.apache.spark.rdd.RDD
    // $example off$
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-192754426
  
    cc @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559248
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaElementwiseProductExample.java ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.mllib;
    +
    +import org.apache.spark.SparkConf;
    --- End diff --
    
    Change the imports to
    
    ```Java
    // $example on$
    import java.util.Arrays;
    // $example off$
    
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.JavaSparkContext;
    // $example on$
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.function.Function;
    import org.apache.spark.mllib.feature.ElementwiseProduct;
    import org.apache.spark.mllib.linalg.Vector;
    import org.apache.spark.mllib.linalg.Vectors;
    // $example off$
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183149434
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51164/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186761493
  
    Thanks a lot for this huge code review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559408
  
    --- Diff: examples/src/main/python/mllib/standard_scaler_example.py ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.feature import StandardScaler
    +from pyspark.mllib.feature import StandardScalerModel
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="StandardScalerExample")  # SparkContext
    +
    +    # $example on$
    +    data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    label = data.map(lambda x: x.label)
    +    features = data.map(lambda x: x.features)
    +
    +    scaler1 = StandardScaler().fit(features)
    +    scaler2 = StandardScaler(withMean=True, withStd=True).fit(features)
    +    # scaler3 is an identical model to scaler2, and will produce identical transformations
    --- End diff --
    
    delete this line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186667715
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51601/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559446
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import HashingTF
    +from pyspark.mllib.feature import IDF
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="TFIDFExample")  # SparkContext
    +
    +    # $example on$
    +    # Load documents (one per line).
    +    documents = sc.textFile("data/mllib/kmeans_data.txt").map(lambda line: line.split(" "))
    +
    +    hashingTF = HashingTF()
    +    tf = hashingTF.transform(documents)
    +
    +    # While applying HashingTF only needs a single pass to the data,
    +    # applying IDF needs two passes:
    +    # first to compute the IDF vector and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    idf = IDF().fit(tf)
    +    tfidf = idf.transform(tf)
    +
    +    # spark.mllib’s IDF implementation provides an option for ignoring terms
    --- End diff --
    
    The `'` of `spark.mllib’s` is not an English one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560852
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PCAExample.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Make it out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186777970
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51612/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559326
  
    --- Diff: examples/src/main/python/mllib/standard_scaler_example.py ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.feature import StandardScaler
    +from pyspark.mllib.feature import StandardScalerModel
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="StandardScalerExample")  # SparkContext
    +
    +    # $example on$
    +    data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    label = data.map(lambda x: x.label)
    +    features = data.map(lambda x: x.features)
    +
    +    scaler1 = StandardScaler().fit(features)
    +    scaler2 = StandardScaler(withMean=True, withStd=True).fit(features)
    +    # scaler3 is an identical model to scaler2, and will produce identical transformations
    +    scaler3 = StandardScalerModel(scaler2.std, scaler2.mean)
    +
    +    # data1 will be unit variance.
    +    data1 = label.zip(scaler1.transform(features))
    +
    +    # Without converting the features into dense vectors, transformation with zero mean will raise
    +    # exception on sparse vector.
    +    # data2 will be unit variance and zero mean.
    +    data2 = label.zip(scaler1.transform(features.map(lambda x: Vectors.dense(x.toArray()))))
    +
    --- End diff --
    
    remove  the blank


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183149432
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-184572136
  
    **[Test build #51351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51351/consoleFull)** for PR 11142 at commit [`d0c8bc8`](https://github.com/apache/spark/commit/d0c8bc8af73c058bce69d29ff8fbbf16caad080f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-200966022
  
    @mengxr I've double checked to build with both scala 2.11 and 2.10 locally and both builds finished successfully


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559515
  
    --- Diff: examples/src/main/python/mllib/word_2_vec_example.py ---
    @@ -0,0 +1,40 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    --- End diff --
    
    Change the file name to `word2vec_example.py`. Don't split the `word2vec`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-187916327
  
    **[Test build #51795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51795/consoleFull)** for PR 11142 at commit [`6961bdc`](https://github.com/apache/spark/commit/6961bdc81436764b12f3eacff3e54aef97e3dff6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560996
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{HashingTF, IDF}
    +import org.apache.spark.mllib.linalg.Vector
    +// $example off$
    +import org.apache.spark.rdd.RDD
    +
    +object TFIDFExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("TFIDFExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Load documents (one per line).
    +    val documents: RDD[Seq[String]] = sc.textFile("data/mllib/kmeans_data.txt")
    +      .map(_.split(" ").toSeq)
    +
    +    val hashingTF = new HashingTF()
    +    val tf: RDD[Vector] = hashingTF.transform(documents)
    +
    +    // While applying HashingTF only needs a single pass to the data,
    +    // applying IDF needs two passes: first to compute the IDF vector
    +    // and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    val idf = new IDF().fit(tf)
    +    val tfidf: RDD[Vector] = idf.transform(tf)
    +
    +    // spark.mllib IDF implementation provides an option for ignoring terms
    +    // which occur in less than a minimum number of documents.
    +    // In such cases, the IDF for these terms is set to 0.
    +    // This feature can be used by passing the minDocFreq value to the IDF constructor.
    +    tf.cache()
    +    val idfIgnore = new IDF(minDocFreq = 2).fit(tf)
    +    val tfidfIgnore: RDD[Vector] = idfIgnore.transform(tf)
    +    // $example off$
    +
    --- End diff --
    
    add outputs of tfidf and tfidfIgnore


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53562173
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import HashingTF
    +from pyspark.mllib.feature import IDF
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="TFIDFExample")  # SparkContext
    +
    +    # $example on$
    +    # Load documents (one per line).
    +    documents = sc.textFile("data/mllib/kmeans_data.txt").map(lambda line: line.split(" "))
    +
    +    hashingTF = HashingTF()
    +    tf = hashingTF.transform(documents)
    +
    +    # While applying HashingTF only needs a single pass to the data,
    +    # applying IDF needs two passes:
    +    # first to compute the IDF vector and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    idf = IDF().fit(tf)
    +    tfidf = idf.transform(tf)
    +
    +    # spark.mllib’s IDF implementation provides an option for ignoring terms
    --- End diff --
    
    :stuck_out_tongue_closed_eyes: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560912
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/NormalizerExample.scala ---
    @@ -0,0 +1,51 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.Normalizer
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object NormalizerExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    def main(args: Array[String]): Unit = {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-191826211
  
    @mengxr @keypointt LGTM except for the imports.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-191920514
  
    **[Test build #52403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52403/consoleFull)** for PR 11142 at commit [`3513e0f`](https://github.com/apache/spark/commit/3513e0f63ed88479052266db5ddc0f22aab175a2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559180
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaChiSqSelectorExample.java ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.mllib;
    +
    +import org.apache.spark.SparkConf;
    +// $example on$
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.JavaSparkContext;
    +import org.apache.spark.api.java.function.Function;
    +import org.apache.spark.mllib.feature.ChiSqSelector;
    +import org.apache.spark.mllib.feature.ChiSqSelectorModel;
    +import org.apache.spark.mllib.linalg.Vectors;
    +import org.apache.spark.mllib.regression.LabeledPoint;
    +import org.apache.spark.mllib.util.MLUtils;
    +// $example off$
    +
    +public class JavaChiSqSelectorExample {
    +  public static void main(String[] args) {
    +
    +    SparkConf conf = new SparkConf().setAppName("JavaChiSqSelectorExample");
    +    JavaSparkContext jsc = new JavaSparkContext(conf);
    +
    +    // $example on$
    +    JavaRDD<LabeledPoint> points = MLUtils.loadLibSVMFile(jsc.sc(),
    +      "data/mllib/sample_libsvm_data.txt").toJavaRDD().cache();
    +
    +    // Discretize data in 16 equal bins since ChiSqSelector requires categorical features
    +    // Although features are doubles, the ChiSqSelector treats each unique value as a category
    +    JavaRDD<LabeledPoint> discretizedData = points.map(
    +      new Function<LabeledPoint, LabeledPoint>() {
    +        @Override
    +        public LabeledPoint call(LabeledPoint lp) {
    +          final double[] discretizedFeatures = new double[lp.features().size()];
    +          for (int i = 0; i < lp.features().size(); ++i) {
    +            discretizedFeatures[i] = Math.floor(lp.features().apply(i) / 16);
    +          }
    +          return new LabeledPoint(lp.label(), Vectors.dense(discretizedFeatures));
    +        }
    +      }
    +    );
    +
    +    // Create ChiSqSelector that will select top 50 of 692 features
    +    ChiSqSelector selector = new ChiSqSelector(50);
    +    // Create ChiSqSelector model (selecting features)
    +    final ChiSqSelectorModel transformer = selector.fit(discretizedData.rdd());
    +    // Filter the top 50 features from each feature vector
    +    JavaRDD<LabeledPoint> filteredData = discretizedData.map(
    +      new Function<LabeledPoint, LabeledPoint>() {
    +        @Override
    +        public LabeledPoint call(LabeledPoint lp) {
    +          return new LabeledPoint(lp.label(), transformer.transform(lp.features()));
    +        }
    +      }
    +    );
    +    // $example off$
    +
    --- End diff --
    
    It's better to add an output of `filteredData` to make it a complete example code. Refer to https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/mllib/JavaLDAExample.java#L68


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-191912856
  
    **[Test build #52403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52403/consoleFull)** for PR 11142 at commit [`3513e0f`](https://github.com/apache/spark/commit/3513e0f63ed88479052266db5ddc0f22aab175a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560920
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/Word2VecExample.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
    +// $example off$
    +
    +object Word2VecExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    def main(args: Array[String]): Unit = {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-201030201
  
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53562373
  
    --- Diff: examples/src/main/python/mllib/word_2_vec_example.py ---
    @@ -0,0 +1,40 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import Word2Vec
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="Word2VecExample")  # SparkContext
    +
    +    # $example on$
    +    inp = sc.textFile("text8_lines").map(lambda row: row.split(" "))
    --- End diff --
    
    just double check with you, that it good enough to just change the included python file name to "word2vec_example.py" in docs/mllib-feature-extraction.md? for now, code is directly included and "text8_lines" appears nowhere else


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53561004
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/Word2VecExample.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Move it out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-182113041
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559328
  
    --- Diff: examples/src/main/python/mllib/standard_scaler_example.py ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.feature import StandardScaler
    +from pyspark.mllib.feature import StandardScalerModel
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="StandardScalerExample")  # SparkContext
    +
    +    # $example on$
    +    data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    label = data.map(lambda x: x.label)
    +    features = data.map(lambda x: x.features)
    +
    +    scaler1 = StandardScaler().fit(features)
    +    scaler2 = StandardScaler(withMean=True, withStd=True).fit(features)
    +    # scaler3 is an identical model to scaler2, and will produce identical transformations
    +    scaler3 = StandardScalerModel(scaler2.std, scaler2.mean)
    +
    +    # data1 will be unit variance.
    +    data1 = label.zip(scaler1.transform(features))
    +
    +    # Without converting the features into dense vectors, transformation with zero mean will raise
    +    # exception on sparse vector.
    +    # data2 will be unit variance and zero mean.
    +    data2 = label.zip(scaler1.transform(features.map(lambda x: Vectors.dense(x.toArray()))))
    +
    +    # $example off$
    +
    --- End diff --
    
    add outputs for data1 and data2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186667567
  
    **[Test build #51601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51601/consoleFull)** for PR 11142 at commit [`2d57ba0`](https://github.com/apache/spark/commit/2d57ba0f96cdb439be4f1fa6cabc49e02331742a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186777967
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-192036161
  
    @mengxr LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559502
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import HashingTF
    +from pyspark.mllib.feature import IDF
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="TFIDFExample")  # SparkContext
    +
    +    # $example on$
    +    # Load documents (one per line).
    +    documents = sc.textFile("data/mllib/kmeans_data.txt").map(lambda line: line.split(" "))
    +
    +    hashingTF = HashingTF()
    +    tf = hashingTF.transform(documents)
    +
    +    # While applying HashingTF only needs a single pass to the data,
    +    # applying IDF needs two passes:
    +    # first to compute the IDF vector and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    idf = IDF().fit(tf)
    +    tfidf = idf.transform(tf)
    +
    +    # spark.mllib’s IDF implementation provides an option for ignoring terms
    +    # which occur in less than a minimum number of documents.
    +    # In such cases, the IDF for these terms is set to 0.
    +    # This feature can be used by passing the minDocFreq value to the IDF constructor.
    +    tf.cache()
    +    idfIgnore = IDF(minDocFreq=2).fit(tf)
    +    tfidfIgnore = idf.transform(tf)
    +    # $example off$
    +
    --- End diff --
    
    Add outputs of `tfidf` and `tfidfIgnore`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559520
  
    --- Diff: docs/mllib-feature-extraction.md ---
    @@ -192,47 +119,12 @@ Here we assume the extracted file is `text8` and in same directory as you run th
     <div data-lang="scala" markdown="1">
     Refer to the [`Word2Vec` Scala docs](api/scala/index.html#org.apache.spark.mllib.feature.Word2Vec) for details on the API.
     
    -{% highlight scala %}
    -import org.apache.spark._
    -import org.apache.spark.rdd._
    -import org.apache.spark.SparkContext._
    -import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
    -
    -val input = sc.textFile("text8").map(line => line.split(" ").toSeq)
    -
    -val word2vec = new Word2Vec()
    -
    -val model = word2vec.fit(input)
    -
    -val synonyms = model.findSynonyms("china", 40)
    -
    -for((synonym, cosineSimilarity) <- synonyms) {
    -  println(s"$synonym $cosineSimilarity")
    -}
    -
    -// Save and load model
    -model.save(sc, "myModelPath")
    -val sameModel = Word2VecModel.load(sc, "myModelPath")
    -{% endhighlight %}
    +{% include_example scala/org/apache/spark/examples/mllib/Word2VecExample.scala %}
     </div>
     <div data-lang="python" markdown="1">
     Refer to the [`Word2Vec` Python docs](api/python/pyspark.mllib.html#pyspark.mllib.feature.Word2Vec) for more details on the API.
     
    -{% highlight python %}
    -from pyspark import SparkContext
    -from pyspark.mllib.feature import Word2Vec
    -
    -sc = SparkContext(appName='Word2Vec')
    -inp = sc.textFile("text8_lines").map(lambda row: row.split(" "))
    -
    -word2vec = Word2Vec()
    -model = word2vec.fit(inp)
    -
    -synonyms = model.findSynonyms('china', 40)
    -
    -for word, cosine_distance in synonyms:
    -    print("{}: {}".format(word, cosine_distance))
    -{% endhighlight %}
    +{% include_example python/mllib/word_2_vec_example.py %}
    --- End diff --
    
    change the filename according to the python file after you changing the python example name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186777755
  
    **[Test build #51612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51612/consoleFull)** for PR 11142 at commit [`ff9900b`](https://github.com/apache/spark/commit/ff9900b1e7e93fe9a91c26e072405d2ee7049f35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560904
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/StandardScalerExample.scala ---
    @@ -0,0 +1,56 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Move it out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560742
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/Word2VecExample.scala ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
    +// $example off$
    +
    +object Word2VecExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("Word2VecExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    val input = sc.textFile("text8").map(line => line.split(" ").toSeq)
    +
    +    val word2vec = new Word2Vec()
    +
    +    val model = word2vec.fit(input)
    +
    +    val synonyms = model.findSynonyms("china", 40)
    --- End diff --
    
    change it to `val synonyms = model.findSynonyms("1", 5)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559285
  
    --- Diff: examples/src/main/python/mllib/normalizer_example.py ---
    @@ -0,0 +1,45 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    --- End diff --
    
    move SparkContext out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560836
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ElementwiseProductExample.scala ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.ElementwiseProduct
    +import org.apache.spark.mllib.linalg.Vectors
    +// $example off$
    +
    +object ElementwiseProductExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("ElementwiseProductExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Create some vector data; also works for sparse vectors
    +    val data = sc.parallelize(Array(Vectors.dense(1.0, 2.0, 3.0), Vectors.dense(4.0, 5.0, 6.0)))
    +
    +    val transformingVector = Vectors.dense(0.0, 1.0, 2.0)
    +    val transformer = new ElementwiseProduct(transformingVector)
    +
    +    // Batch transform and per-row transform give the same results:
    +    val transformedData = transformer.transform(data)
    +    val transformedData2 = data.map(x => transformer.transform(x))
    +    // $example off$
    +
    --- End diff --
    
    add outputs of `transformedData` and `transformedData2`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559542
  
    --- Diff: examples/src/main/python/mllib/word_2_vec_example.py ---
    @@ -0,0 +1,40 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    --- End diff --
    
    move the SparkContext out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560786
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ChiSqSelectorExample.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.ChiSqSelector
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.regression.LabeledPoint
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object ChiSqSelectorExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("ChiSqSelectorExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Load some data in libsvm format
    +    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    // Discretize data in 16 equal bins since ChiSqSelector requires categorical features
    +    // Even though features are doubles, the ChiSqSelector treats each unique value as a category
    +    val discretizedData = data.map { lp =>
    +      LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x => (x / 16).floor } ) )
    +    }
    +    // Create ChiSqSelector that will select top 50 of 692 features
    +    val selector = new ChiSqSelector(50)
    +    // Create ChiSqSelector model (selecting features)
    +    val transformer = selector.fit(discretizedData)
    +    // Filter the top 50 features from each feature vector
    +    val filteredData = discretizedData.map { lp =>
    +      LabeledPoint(lp.label, transformer.transform(lp.features))
    +    }
    +    // $example off$
    +
    --- End diff --
    
    add an output of `filteredData`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560914
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PCAExample.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.PCA
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.regression.{LabeledPoint, LinearRegressionWithSGD}
    +// $example off$
    +
    +object PCAExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    def main(args: Array[String]): Unit = {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559453
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    --- End diff --
    
    Move the SparkContext out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559470
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import HashingTF
    +from pyspark.mllib.feature import IDF
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="TFIDFExample")  # SparkContext
    +
    +    # $example on$
    +    # Load documents (one per line).
    +    documents = sc.textFile("data/mllib/kmeans_data.txt").map(lambda line: line.split(" "))
    +
    +    hashingTF = HashingTF()
    +    tf = hashingTF.transform(documents)
    +
    +    # While applying HashingTF only needs a single pass to the data,
    +    # applying IDF needs two passes:
    +    # first to compute the IDF vector and second to scale the term frequencies by IDF.
    --- End diff --
    
    capitalize the first F


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560755
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ChiSqSelectorExample.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Move the SparkContext out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560761
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ChiSqSelectorExample.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.ChiSqSelector
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.regression.LabeledPoint
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object ChiSqSelectorExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    `def main(args: Array[String]): Unit = {`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183103668
  
    **[Test build #51140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51140/consoleFull)** for PR 11142 at commit [`19ad929`](https://github.com/apache/spark/commit/19ad929f72a030fb0ce3ae1794c416cc1ce05e7e).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559223
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaElementwiseProductExample.java ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.mllib;
    +
    +import org.apache.spark.SparkConf;
    +// $example on$
    +import java.util.Arrays;
    +import org.apache.spark.api.java.JavaRDD;
    +import org.apache.spark.api.java.JavaSparkContext;
    +import org.apache.spark.api.java.function.Function;
    +import org.apache.spark.mllib.feature.ElementwiseProduct;
    +import org.apache.spark.mllib.linalg.Vector;
    +import org.apache.spark.mllib.linalg.Vectors;
    +// $example off$
    +
    +public class JavaElementwiseProductExample {
    +  public static void main(String[] args) {
    +
    +    SparkConf conf = new SparkConf().setAppName("JavaElementwiseProductExample");
    +    JavaSparkContext jsc = new JavaSparkContext(conf);
    +
    +    // $example on$
    +    // Create some vector data; also works for sparse vectors
    +    JavaRDD<Vector> data = jsc.parallelize(Arrays.asList(
    +      Vectors.dense(1.0, 2.0, 3.0), Vectors.dense(4.0, 5.0, 6.0)));
    +    Vector transformingVector = Vectors.dense(0.0, 1.0, 2.0);
    +    final ElementwiseProduct transformer = new ElementwiseProduct(transformingVector);
    +
    +    // Batch transform and per-row transform give the same results:
    +    JavaRDD<Vector> transformedData = transformer.transform(data);
    +    JavaRDD<Vector> transformedData2 = data.map(
    +      new Function<Vector, Vector>() {
    +        @Override
    +        public Vector call(Vector v) {
    +          return transformer.transform(v);
    +        }
    +      }
    +    );
    +    // $example off$
    +
    --- End diff --
    
    Add outputs of `transformedData` and `transformedData2`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560940
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Move it out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-184572262
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51351/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560896
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PCAExample.scala ---
    @@ -0,0 +1,74 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.PCA
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.regression.{LabeledPoint, LinearRegressionWithSGD}
    +// $example off$
    +
    +object PCAExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("PCAExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    val data = sc.textFile("data/mllib/ridge-data/lpsa.data").map { line =>
    +      val parts = line.split(',')
    +      LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
    +    }.cache()
    +
    +    val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
    +    val training = splits(0).cache()
    +    val test = splits(1)
    +
    +    val pca = new PCA(training.first().features.size/2).fit(data.map(_.features))
    +    val training_pca = training.map(p => p.copy(features = pca.transform(p.features)))
    +    val test_pca = test.map(p => p.copy(features = pca.transform(p.features)))
    +
    +    val numIterations = 100
    +    val model = LinearRegressionWithSGD.train(training, numIterations)
    +    val model_pca = LinearRegressionWithSGD.train(training_pca, numIterations)
    +
    +    val valuesAndPreds = test.map { point =>
    +      val score = model.predict(point.features)
    +      (score, point.label)
    +    }
    +
    +    val valuesAndPreds_pca = test_pca.map { point =>
    +      val score = model_pca.predict(point.features)
    +      (score, point.label)
    +    }
    +
    +    val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
    --- End diff --
    
    change the two lines to:
    
    ```scala
    val MSE = valuesAndPreds.map { case(v, p) => math.pow(v - p, 2) }.mean()
    val MSE_pca = valuesAndPreds_pca.map { case(v, p) => math.pow(v - p, 2) }.mean()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559452
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    --- End diff --
    
    remove the line, since no code below uses `Vectors`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-201028658
  
    Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560909
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ElementwiseProductExample.scala ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.ElementwiseProduct
    +import org.apache.spark.mllib.linalg.Vectors
    +// $example off$
    +
    +object ElementwiseProductExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    def main(args: Array[String]): Unit = {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560713
  
    --- Diff: examples/src/main/python/mllib/word_2_vec_example.py ---
    @@ -0,0 +1,40 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import Word2Vec
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="Word2VecExample")  # SparkContext
    +
    +    # $example on$
    +    inp = sc.textFile("text8_lines").map(lambda row: row.split(" "))
    --- End diff --
    
    Change the path to `"data/mllib/sample_lda_data.txt"`, we need to modify the example otherwise we cannot run the example directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186666614
  
    **[Test build #51601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51601/consoleFull)** for PR 11142 at commit [`2d57ba0`](https://github.com/apache/spark/commit/2d57ba0f96cdb439be4f1fa6cabc49e02331742a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-187905519
  
    **[Test build #51795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51795/consoleFull)** for PR 11142 at commit [`6961bdc`](https://github.com/apache/spark/commit/6961bdc81436764b12f3eacff3e54aef97e3dff6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559265
  
    --- Diff: examples/src/main/python/mllib/elementwise_product_example.py ---
    @@ -0,0 +1,43 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    --- End diff --
    
    Move the SparkContext out of example on and off:
    
    ```python
    from pyspark import SparkContext
    # $example on$
    from pyspark.mllib.linalg import Vectors
    from pyspark.mllib.feature import ElementwiseProduct
    # $example off$
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183102624
  
    **[Test build #51140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51140/consoleFull)** for PR 11142 at commit [`19ad929`](https://github.com/apache/spark/commit/19ad929f72a030fb0ce3ae1794c416cc1ce05e7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560977
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{HashingTF, IDF}
    +import org.apache.spark.mllib.linalg.Vector
    +// $example off$
    +import org.apache.spark.rdd.RDD
    +
    +object TFIDFExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("TFIDFExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Load documents (one per line).
    +    val documents: RDD[Seq[String]] = sc.textFile("data/mllib/kmeans_data.txt")
    +      .map(_.split(" ").toSeq)
    +
    +    val hashingTF = new HashingTF()
    +    val tf: RDD[Vector] = hashingTF.transform(documents)
    +
    +    // While applying HashingTF only needs a single pass to the data,
    --- End diff --
    
    Change the comment here to:
    
    ```scala
    // While applying HashingTF only needs a single pass to the data, applying IDF needs two passes:
    // First to compute the IDF vector and second to scale the term frequencies by IDF.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560780
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ChiSqSelectorExample.scala ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.ChiSqSelector
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.regression.LabeledPoint
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object ChiSqSelectorExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("ChiSqSelectorExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Load some data in libsvm format
    +    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    // Discretize data in 16 equal bins since ChiSqSelector requires categorical features
    +    // Even though features are doubles, the ChiSqSelector treats each unique value as a category
    +    val discretizedData = data.map { lp =>
    +      LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x => (x / 16).floor } ) )
    --- End diff --
    
    Remove the last two white spaces before two parentheses:
    
    `LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x => (x / 16).floor }))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560918
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{HashingTF, IDF}
    +import org.apache.spark.mllib.linalg.Vector
    +// $example off$
    +import org.apache.spark.rdd.RDD
    +
    +object TFIDFExample {
    +
    +  def main(args: Array[String]) {
    --- End diff --
    
    def main(args: Array[String]): Unit = {


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559284
  
    --- Diff: examples/src/main/python/mllib/normalizer_example.py ---
    @@ -0,0 +1,45 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.feature import Normalizer
    +
    --- End diff --
    
    remove the blank


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560800
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/ElementwiseProductExample.scala ---
    @@ -0,0 +1,50 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Move SparkContext out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560717
  
    --- Diff: examples/src/main/python/mllib/word_2_vec_example.py ---
    @@ -0,0 +1,40 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import Word2Vec
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="Word2VecExample")  # SparkContext
    +
    +    # $example on$
    +    inp = sc.textFile("text8_lines").map(lambda row: row.split(" "))
    +
    +    word2vec = Word2Vec()
    +    model = word2vec.fit(inp)
    +
    +    synonyms = model.findSynonyms('china', 40)
    --- End diff --
    
    change it to `synonyms = model.findSynonyms('1', 5)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559292
  
    --- Diff: examples/src/main/python/mllib/normalizer_example.py ---
    @@ -0,0 +1,45 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.util import MLUtils
    +from pyspark.mllib.feature import Normalizer
    +
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="NormalizerExample")  # SparkContext
    +
    +    # $example on$
    +    data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +    labels = data.map(lambda x: x.label)
    +    features = data.map(lambda x: x.features)
    +
    +    normalizer1 = Normalizer()
    +    normalizer2 = Normalizer(p=float("inf"))
    +
    +    # Each sample in data1 will be normalized using $L^2$ norm.
    +    data1 = labels.zip(normalizer1.transform(features))
    +
    +    # Each sample in data2 will be normalized using $L^\infty$ norm.
    +    data2 = labels.zip(normalizer2.transform(features))
    +    # $example off$
    +
    --- End diff --
    
    add outputs for data1 and data2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183103687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51140/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183100479
  
    @mengxr Don't worry, I'll review all of the replace code PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186775641
  
    **[Test build #51612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51612/consoleFull)** for PR 11142 at commit [`ff9900b`](https://github.com/apache/spark/commit/ff9900b1e7e93fe9a91c26e072405d2ee7049f35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559466
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import HashingTF
    +from pyspark.mllib.feature import IDF
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="TFIDFExample")  # SparkContext
    +
    +    # $example on$
    +    # Load documents (one per line).
    +    documents = sc.textFile("data/mllib/kmeans_data.txt").map(lambda line: line.split(" "))
    +
    +    hashingTF = HashingTF()
    +    tf = hashingTF.transform(documents)
    +
    +    # While applying HashingTF only needs a single pass to the data,
    +    # applying IDF needs two passes:
    --- End diff --
    
    Merge the line to the end of the previous one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560991
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{HashingTF, IDF}
    +import org.apache.spark.mllib.linalg.Vector
    +// $example off$
    +import org.apache.spark.rdd.RDD
    +
    +object TFIDFExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("TFIDFExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Load documents (one per line).
    +    val documents: RDD[Seq[String]] = sc.textFile("data/mllib/kmeans_data.txt")
    +      .map(_.split(" ").toSeq)
    +
    +    val hashingTF = new HashingTF()
    +    val tf: RDD[Vector] = hashingTF.transform(documents)
    +
    +    // While applying HashingTF only needs a single pass to the data,
    +    // applying IDF needs two passes: first to compute the IDF vector
    +    // and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    val idf = new IDF().fit(tf)
    +    val tfidf: RDD[Vector] = idf.transform(tf)
    +
    +    // spark.mllib IDF implementation provides an option for ignoring terms
    +    // which occur in less than a minimum number of documents.
    +    // In such cases, the IDF for these terms is set to 0.
    +    // This feature can be used by passing the minDocFreq value to the IDF constructor.
    +    tf.cache()
    --- End diff --
    
    remove the line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-191921066
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52403/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559482
  
    --- Diff: examples/src/main/python/mllib/tf_idf_example.py ---
    @@ -0,0 +1,53 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.feature import HashingTF
    +from pyspark.mllib.feature import IDF
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="TFIDFExample")  # SparkContext
    +
    +    # $example on$
    +    # Load documents (one per line).
    +    documents = sc.textFile("data/mllib/kmeans_data.txt").map(lambda line: line.split(" "))
    +
    +    hashingTF = HashingTF()
    +    tf = hashingTF.transform(documents)
    +
    +    # While applying HashingTF only needs a single pass to the data,
    +    # applying IDF needs two passes:
    +    # first to compute the IDF vector and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    idf = IDF().fit(tf)
    +    tfidf = idf.transform(tf)
    +
    +    # spark.mllib’s IDF implementation provides an option for ignoring terms
    +    # which occur in less than a minimum number of documents.
    +    # In such cases, the IDF for these terms is set to 0.
    +    # This feature can be used by passing the minDocFreq value to the IDF constructor.
    +    tf.cache()
    --- End diff --
    
    remove the line. There is no need to cache `tf` again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-184569082
  
    **[Test build #51351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51351/consoleFull)** for PR 11142 at commit [`d0c8bc8`](https://github.com/apache/spark/commit/d0c8bc8af73c058bce69d29ff8fbbf16caad080f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560990
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala ---
    @@ -0,0 +1,63 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{HashingTF, IDF}
    +import org.apache.spark.mllib.linalg.Vector
    +// $example off$
    +import org.apache.spark.rdd.RDD
    +
    +object TFIDFExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("TFIDFExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    // Load documents (one per line).
    +    val documents: RDD[Seq[String]] = sc.textFile("data/mllib/kmeans_data.txt")
    +      .map(_.split(" ").toSeq)
    +
    +    val hashingTF = new HashingTF()
    +    val tf: RDD[Vector] = hashingTF.transform(documents)
    +
    +    // While applying HashingTF only needs a single pass to the data,
    +    // applying IDF needs two passes: first to compute the IDF vector
    +    // and second to scale the term frequencies by IDF.
    +    tf.cache()
    +    val idf = new IDF().fit(tf)
    +    val tfidf: RDD[Vector] = idf.transform(tf)
    +
    +    // spark.mllib IDF implementation provides an option for ignoring terms
    --- End diff --
    
    Change the comment to:
    
    ```scala
    // spark.mllib IDF implementation provides an option for ignoring terms which occur in less than
    // a minimum number of documents. In such cases, the IDF for these terms is set to 0.
    // This feature can be used by passing the minDocFreq value to the IDF constructor.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-187916783
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559324
  
    --- Diff: examples/src/main/python/mllib/standard_scaler_example.py ---
    @@ -0,0 +1,52 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.mllib.linalg import Vectors
    --- End diff --
    
    change the imports to
    
    ```python
    from pyspark import SparkContext
    # $example on$
    from pyspark.mllib.linalg import Vectors
    from pyspark.mllib.feature import StandardScaler, StandardScalerModel
    from pyspark.mllib.util import MLUtils
    # $example off$
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-187916788
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51795/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r54898640
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaChiSqSelectorExample.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.mllib;
    +
    +import org.apache.spark.SparkConf;
    --- End diff --
    
    Imports: Remember to keep unused imports out of example on and off.
    
    ```java
    import org.apache.spark.SparkConf;
    import org.apache.spark.api.java.JavaSparkContext;
    // $example on$
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.function.Function;
    import org.apache.spark.api.java.function.VoidFunction;
    import org.apache.spark.mllib.feature.ChiSqSelector;
    import org.apache.spark.mllib.feature.ChiSqSelectorModel;
    import org.apache.spark.mllib.linalg.Vectors;
    import org.apache.spark.mllib.regression.LabeledPoint;
    import org.apache.spark.mllib.util.MLUtils;
    // $example off$
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183103682
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183098771
  
    @yinxusen Could you help review this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560849
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/NormalizerExample.scala ---
    @@ -0,0 +1,51 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    --- End diff --
    
    Move it out of example on and off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53560933
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/StandardScalerExample.scala ---
    @@ -0,0 +1,56 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.mllib
    +
    +import org.apache.spark.SparkConf
    +// $example on$
    +import org.apache.spark.SparkContext
    +import org.apache.spark.mllib.feature.{StandardScaler, StandardScalerModel}
    +import org.apache.spark.mllib.linalg.Vectors
    +import org.apache.spark.mllib.util.MLUtils
    +// $example off$
    +
    +object StandardScalerExample {
    +
    +  def main(args: Array[String]) {
    +
    +    val conf = new SparkConf().setAppName("StandardScalerExample")
    +    val sc = new SparkContext(conf)
    +
    +    // $example on$
    +    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    +
    +    val scaler1 = new StandardScaler().fit(data.map(x => x.features))
    +    val scaler2 = new StandardScaler(withMean = true, withStd = true).fit(data.map(x => x.features))
    +    // scaler3 is an identical model to scaler2, and will produce identical transformations
    +    val scaler3 = new StandardScalerModel(scaler2.std, scaler2.mean)
    +
    +    // data1 will be unit variance.
    +    val data1 = data.map(x => (x.label, scaler1.transform(x.features)))
    +
    +    // Without converting the features into dense vectors, transformation with zero mean will raise
    +    // exception on sparse vector.
    +    // data2 will be unit variance and zero mean.
    +    val data2 = data.map(x => (x.label, scaler2.transform(Vectors.dense(x.features.toArray))))
    +    // $example off$
    +
    --- End diff --
    
    Add outputs of data1 and data2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-183098721
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11142#issuecomment-186667710
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by yinxusen <gi...@git.apache.org>.

Github user yinxusen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r53559278
  
    --- Diff: examples/src/main/python/mllib/elementwise_product_example.py ---
    @@ -0,0 +1,43 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark import SparkContext
    +from pyspark.mllib.linalg import Vectors
    +from pyspark.mllib.feature import ElementwiseProduct
    +# $example off$
    +
    +if __name__ == "__main__":
    +    sc = SparkContext(appName="ElementwiseProductExample")  # SparkContext
    +
    +    # $example on$
    +    data = sc.textFile("data/mllib/kmeans_data.txt")
    +    parsedData = data.map(lambda x: [float(t) for t in x.split(" ")])
    +
    +    # Create weight vector.
    +    transformingVector = Vectors.dense([0.0, 1.0, 2.0])
    +    transformer = ElementwiseProduct(transformingVector)
    +
    +    # Batch transform
    +    transformedData = transformer.transform(parsedData)
    +    # Single-row transform
    +    transformedData2 = transformer.transform(parsedData.first())
    +    # $example off$
    +
    --- End diff --
    
    Add outputs of transformedData and transformedData2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...

Posted by keypointt <gi...@git.apache.org>.

Github user keypointt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11142#discussion_r54928709
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/mllib/JavaChiSqSelectorExample.java ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.mllib;
    +
    +import org.apache.spark.SparkConf;
    --- End diff --
    
    thank you @yinxusen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org