You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2016/07/21 21:41:40 UTC

[GitHub] spark pull request #14308: [SPARK-16260][EXAMPLES][ML] Improve ML Example Ou...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/14308

    [SPARK-16260][EXAMPLES][ML]  Improve ML Example Outputs

    ## What changes were proposed in this pull request?
    Improve example outputs to better reflect the functionality that is being presented.  This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output.  Explicitly set parameters when they are used as part of the example.  Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema.  Synced examples between different APIs.
    
    ## How was this patch tested?
    Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark ml-examples-improve-output-SPARK-16260

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14308
    
----
commit 7b4496b16517b01c01abd6aebe84b53876265b82
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-19T00:22:12Z

    finished going through about a third of examples

commit 6e4ed29e704e4805fff312b561f8e41919e014eb
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-20T00:23:25Z

    Fixed more examples, about half done now

commit 26718e9da96de142e4bb3078ffdacbf94e4c3d47
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-20T18:19:45Z

    more progress up to NaiveBayes example

commit ff066ce1ad3391c707cd21b4802c5843a70a2da9
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T00:26:59Z

    further progress up to PCA example

commit 53a29411c5969d1bc25ace3817cc927213fcb0b7
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T04:28:12Z

    continued throught examples up to Tf Idf

commit 38c319945e854939f86b8e3f67ebcb04d0be532f
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T20:22:31Z

    finished remaining ml examples

commit a8093bec8fc4090711e6d7b56001a288db03235d
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T20:37:22Z

    fixed style checks

commit afe2b2ad3069363de62a6f25cd1e4ac706b9e6b8
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T20:57:36Z

    fixed Java import ordering

commit b7384cef97f89730f4f400873c8369775bbe994e
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T21:09:41Z

    minor cleanup

commit ae2249a3396f6585c504986234d664dd23f9c401
Author: Bryan Cutler <cu...@gmail.com>
Date:   2016-07-21T21:33:35Z

    made accurracy reporting consistent

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72836296
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPolynomialExpansionExample.java ---
    @@ -48,23 +48,19 @@ public static void main(String[] args) {
           .setDegree(3);
     
         List<Row> data = Arrays.asList(
    -      RowFactory.create(Vectors.dense(-2.0, 2.3)),
    +      RowFactory.create(Vectors.dense(2.0, 1.0)),
    --- End diff --
    
    The fractional part makes the output a little ugly, where as using whole numbers is more readable and still shows the transform
    
    before
    ```
    [[-2.0,4.0,-8.0,2.3,-4.6,9.2,5.289999999999999,-10.579999999999998,12.166999999999996]]
    [[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]]
    [[0.6,0.36,0.216,-1.1,-0.66,-0.396,1.2100000000000002,0.7260000000000001,-1.3310000000000004]]
    ```
    
    after
    ```
    +----------+------------------------------------------+
    |features  |polyFeatures                              |
    +----------+------------------------------------------+
    |[2.0,1.0] |[2.0,4.0,8.0,1.0,2.0,4.0,1.0,2.0,1.0]     |
    |[0.0,0.0] |[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]     |
    |[3.0,-1.0]|[3.0,9.0,27.0,-1.0,-3.0,-9.0,1.0,3.0,-1.0]|
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778529
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaPolynomialExpansionExample.java ---
    @@ -48,23 +48,19 @@ public static void main(String[] args) {
           .setDegree(3);
     
         List<Row> data = Arrays.asList(
    -      RowFactory.create(Vectors.dense(-2.0, 2.3)),
    +      RowFactory.create(Vectors.dense(2.0, 1.0)),
    --- End diff --
    
    Why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    @BryanCutler yeah if there are some changes that are more bug-fixes to make the examples work, let's separate those out into a new JIRA & PR. That should be a little higher priority for `2.0.1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #63089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63089/consoleFull)** for PR 14308 at commit [`a556742`](https://github.com/apache/spark/commit/a556742dd38b2722ee7d497e355bc1b9ed974cf4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72830947
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaIsotonicRegressionExample.java ---
    @@ -50,8 +50,8 @@ public static void main(String[] args) {
         IsotonicRegression ir = new IsotonicRegression();
         IsotonicRegressionModel model = ir.fit(dataset);
     
    -    System.out.println("Boundaries in increasing order: " + model.boundaries());
    -    System.out.println("Predictions associated with the boundaries: " + model.predictions());
    +    System.out.println("Boundaries in increasing order: " + model.boundaries() + "\n");
    --- End diff --
    
    The 2 arrays that are printed are large and all the output get clumped together, looking like a huge block of text, so adding some separation makes it a bit more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    I think it's fine to remove files that aren't referenced here too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #63087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63087/consoleFull)** for PR 14308 at commit [`479819d`](https://github.com/apache/spark/commit/479819dbddbe02d099f3b6359b99718e7a71a2df).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Ok, I removed these data files
    ```
    sample_tree_data.csv
    lr_data.txt
    random.data
    ```
    and added example usage to reference `pagerank_data.txt`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778384
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMaxAbsScalerExample.java ---
    @@ -34,10 +44,17 @@ public static void main(String[] args) {
           .getOrCreate();
     
         // $example on$
    -    Dataset<Row> dataFrame = spark
    -      .read()
    -      .format("libsvm")
    -      .load("data/mllib/sample_libsvm_data.txt");
    +    List<Row> data = Arrays.asList(
    --- End diff --
    
    Does the data change here? why change from reading the file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72838595
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaWord2VecExample.java ---
    @@ -55,10 +56,14 @@ public static void main(String[] args) {
           .setOutputCol("result")
           .setVectorSize(3)
           .setMinCount(0);
    +
         Word2VecModel model = word2Vec.fit(documentDF);
         Dataset<Row> result = model.transform(documentDF);
    -    for (Row r : result.select("result").takeAsList(3)) {
    -      System.out.println(r);
    +
    +    for (Row row : result.collectAsList()) {
    +      java.util.List text = row.getList(0);
    --- End diff --
    
    List was already imported, but this should be `List text = ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63089/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Thanks for the review @srowen!  I added some before/after outputs, so hopefully some of the changes make more sense.  I'll fix up the rest after I make another JIRA for the Java errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #62974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62974/consoleFull)** for PR 14308 at commit [`bb2fcee`](https://github.com/apache/spark/commit/bb2fceea1c696b04f2113be8c9c5a9ce638493b9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72857036
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaWord2VecExample.java ---
    @@ -55,10 +56,14 @@ public static void main(String[] args) {
           .setOutputCol("result")
           .setVectorSize(3)
           .setMinCount(0);
    +
         Word2VecModel model = word2Vec.fit(documentDF);
         Dataset<Row> result = model.transform(documentDF);
    -    for (Row r : result.select("result").takeAsList(3)) {
    -      System.out.println(r);
    +
    +    for (Row row : result.collectAsList()) {
    +      java.util.List text = row.getList(0);
    --- End diff --
    
    Yeah, just saying it's also fully qualified here. It could have a generic bound too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63087/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #63274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63274/consoleFull)** for PR 14308 at commit [`b634f9b`](https://github.com/apache/spark/commit/b634f9b8a7fd7f118605800f19266611d8951b33).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #63274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63274/consoleFull)** for PR 14308 at commit [`b634f9b`](https://github.com/apache/spark/commit/b634f9b8a7fd7f118605800f19266611d8951b33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    It's probably OK on the whole, improving or standardizing examples slightly. I left a number of small questions. Some of the changes didn't feel quite worth making but maybe I miss the logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    ping @mengxr @jkbradley @MLnick , any of you mind taking a look at this?  There were a few Java examples I fixed up that wouldn't run because of using mllib.linalg.Vectors.  If it would be easier, I could separate those in another PR to get that in asap.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62974/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    There's a lot of change here; I skimmed it and it all looks generally positive, adding some consistency or clarification, or a fix in some cases. Is sample_libsvm_data.txt used anymore then? it's low risk to merge because they're example changes. I'm OK with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16260][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #62693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62693/consoleFull)** for PR 14308 at commit [`ae2249a`](https://github.com/apache/spark/commit/ae2249a3396f6585c504986234d664dd23f9c401).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #63087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63087/consoleFull)** for PR 14308 at commit [`479819d`](https://github.com/apache/spark/commit/479819dbddbe02d099f3b6359b99718e7a71a2df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63274/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Thanks for taking another look @srowen.  `sample_libsvm_data.txt` is still used but it looks these are 
    never referenced 
    
    ```
    sample_tree_data.csv
    pagerank_data.txt
    lr_data.txt
    random.data
    ```
    I can't place where `sample_tree_data.csv` might have belonged, `pagerank_data.txt` is obvious (just missing reference in usage), and `lr_data.txt`/`random.data` look like labeled points probably from some older MLlib examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    > @BryanCutler yeah if there are some changes that are more bug-fixes to make the examples work, let's separate those out into a new JIRA & PR. That should be a little higher priority for 2.0.1
    
    Sure @MLnick , I realized I should probably do that about half-way into this.  I'll make another JIRA and fix the Java errors there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    attaching a quick audit of example data files and what examples reference them, take from this branch
    [spark_example_data_audit.txt](https://github.com/apache/spark/files/402881/spark_example_data_audit.txt)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72832453
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaGaussianMixtureExample.java ---
    @@ -54,6 +54,7 @@ public static void main(String[] args) {
     
         // Output the parameters of the mixture model
         for (int i = 0; i < model.getK(); i++) {
    +      System.out.println("Gaussian " + i);
    --- End diff --
    
    Yeah the 2 print statements could be combined.  I was probably just trying not to cram too much together, but I think it would be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72835137
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -75,7 +75,7 @@ public static void main(String[] args) {
     
         // compute the classification error on test data.
         double accuracy = evaluator.evaluate(predictions);
    -    System.out.println("Test Error : " + (1 - accuracy));
    +    System.out.println("Test Error = " + (1 - accuracy));
    --- End diff --
    
    Yeah, I was just trying to make things like this consistent with other similar examples.  I think I just saw "=" used more often, but it really doesn't make a difference to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #63089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63089/consoleFull)** for PR 14308 at commit [`a556742`](https://github.com/apache/spark/commit/a556742dd38b2722ee7d497e355bc1b9ed974cf4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16260][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16260][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #62693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62693/consoleFull)** for PR 14308 at commit [`ae2249a`](https://github.com/apache/spark/commit/ae2249a3396f6585c504986234d664dd23f9c401).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16260][EXAMPLES][ML] Improve ML Example Outputs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62693/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72834486
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaNGramExample.java ---
    @@ -55,16 +55,12 @@ public static void main(String[] args) {
     
         Dataset<Row> wordDataFrame = spark.createDataFrame(data, schema);
     
    -    NGram ngramTransformer = new NGram().setInputCol("words").setOutputCol("ngrams");
    +    NGram ngramTransformer = new NGram().setN(2).setInputCol("words").setOutputCol("bigrams");
    --- End diff --
    
    I really only think that the param `N` should be set explicitly.  Looking back, changing the column name was not necessary, let me change that back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14308


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778518
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -75,7 +75,7 @@ public static void main(String[] args) {
     
         // compute the classification error on test data.
         double accuracy = evaluator.evaluate(predictions);
    -    System.out.println("Test Error : " + (1 - accuracy));
    +    System.out.println("Test Error = " + (1 - accuracy));
    --- End diff --
    
    Some of these changes feel kind of trivial, but I guess this is for consistency. But other new System.out.println statements use : not =


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778582
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java ---
    @@ -23,17 +23,23 @@
     import java.util.Arrays;
     import java.util.List;
     
    +import scala.collection.mutable.WrappedArray;
    +
     import org.apache.spark.ml.feature.RegexTokenizer;
     import org.apache.spark.ml.feature.Tokenizer;
    +import org.apache.spark.sql.api.java.UDF1;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
     import org.apache.spark.sql.RowFactory;
    -import org.apache.spark.sql.types.DataTypes;
    -import org.apache.spark.sql.types.Metadata;
    -import org.apache.spark.sql.types.StructField;
    -import org.apache.spark.sql.types.StructType;
    +import org.apache.spark.sql.types.*;
    --- End diff --
    
    Here imports are collapsed to *; elsewhere a * import is expanded. I might generally not touch these, but, the standard is usually to avoid wildcard imports by default


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72837199
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java ---
    @@ -23,17 +23,23 @@
     import java.util.Arrays;
     import java.util.List;
     
    +import scala.collection.mutable.WrappedArray;
    +
     import org.apache.spark.ml.feature.RegexTokenizer;
     import org.apache.spark.ml.feature.Tokenizer;
    +import org.apache.spark.sql.api.java.UDF1;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
     import org.apache.spark.sql.RowFactory;
    -import org.apache.spark.sql.types.DataTypes;
    -import org.apache.spark.sql.types.Metadata;
    -import org.apache.spark.sql.types.StructField;
    -import org.apache.spark.sql.types.StructType;
    +import org.apache.spark.sql.types.*;
    --- End diff --
    
    I agree that wildcards should be avoided, not sure what happened here.  It might have been an automatic thing from the IDE, I'll revert this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    This has been updated since fixing the errors in Java @srowen @MLnick .  I know most of these changes are trivial, but will hopefully make some of the examples easier to follow.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778614
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaWord2VecExample.java ---
    @@ -55,10 +56,14 @@ public static void main(String[] args) {
           .setOutputCol("result")
           .setVectorSize(3)
           .setMinCount(0);
    +
         Word2VecModel model = word2Vec.fit(documentDF);
         Dataset<Row> result = model.transform(documentDF);
    -    for (Row r : result.select("result").takeAsList(3)) {
    -      System.out.println(r);
    +
    +    for (Row row : result.collectAsList()) {
    +      java.util.List text = row.getList(0);
    --- End diff --
    
    Import List


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72833265
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaMaxAbsScalerExample.java ---
    @@ -34,10 +44,17 @@ public static void main(String[] args) {
           .getOrCreate();
     
         // $example on$
    -    Dataset<Row> dataFrame = spark
    -      .read()
    -      .format("libsvm")
    -      .load("data/mllib/sample_libsvm_data.txt");
    +    List<Row> data = Arrays.asList(
    --- End diff --
    
    The data in the file is fine, but uses sparse vectors so that when the result is output, it doesn't really show anything.  Using just a small sample dataset, you can see what it is doing from the output
    
    before
    ```
    +-----+--------------------+--------------------+
    |label|            features|      scaledFeatures|
    +-----+--------------------+--------------------+
    |  0.0|(692,[127,128,129...|(692,[127,128,129...|
    |  1.0|(692,[158,159,160...|(692,[158,159,160...|
    |  1.0|(692,[124,125,126...|(692,[124,125,126...|
    ```
    after
    ```
    +--------------+----------------+                                               
    |      features|  scaledFeatures|
    +--------------+----------------+
    |[1.0,0.1,-8.0]|[0.25,0.01,-1.0]|
    |[2.0,1.0,-4.0]|  [0.5,0.1,-0.5]|
    |[4.0,10.0,8.0]|   [1.0,1.0,1.0]|
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    Thanks @srowen!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14308
  
    **[Test build #62974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62974/consoleFull)** for PR 14308 at commit [`bb2fcee`](https://github.com/apache/spark/commit/bb2fceea1c696b04f2113be8c9c5a9ce638493b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778479
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaNGramExample.java ---
    @@ -55,16 +55,12 @@ public static void main(String[] args) {
     
         Dataset<Row> wordDataFrame = spark.createDataFrame(data, schema);
     
    -    NGram ngramTransformer = new NGram().setInputCol("words").setOutputCol("ngrams");
    +    NGram ngramTransformer = new NGram().setN(2).setInputCol("words").setOutputCol("bigrams");
    --- End diff --
    
    I suppose this doesn't hurt, but ngrams was still fairly OK 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778075
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaIsotonicRegressionExample.java ---
    @@ -50,8 +50,8 @@ public static void main(String[] args) {
         IsotonicRegression ir = new IsotonicRegression();
         IsotonicRegressionModel model = ir.fit(dataset);
     
    -    System.out.println("Boundaries in increasing order: " + model.boundaries());
    -    System.out.println("Predictions associated with the boundaries: " + model.predictions());
    +    System.out.println("Boundaries in increasing order: " + model.boundaries() + "\n");
    --- End diff --
    
    No big deal, but why the extra line break?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14308: [SPARK-16421][EXAMPLES][ML] Improve ML Example Ou...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14308#discussion_r72778289
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaGaussianMixtureExample.java ---
    @@ -54,6 +54,7 @@ public static void main(String[] args) {
     
         // Output the parameters of the mixture model
         for (int i = 0; i < model.getK(); i++) {
    +      System.out.println("Gaussian " + i);
    --- End diff --
    
    Why split over two statements?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org