You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BenFradet <gi...@git.apache.org> on 2015/12/06 19:15:17 UTC

[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

GitHub user BenFradet opened a pull request:

    https://github.com/apache/spark/pull/10166

    [SPARK-12159] [ML] Add user guide section for IndexToString transformer

    Comments welcome.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BenFradet/spark SPARK-12159

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10166
    
----
commit 10ba98ab9516e0193167a3f92e1c1aaf58e4602b
Author: BenFradet <be...@gmail.com>
Date:   2015-12-06T18:12:59Z

    documentation for the IndexToString label transformer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162734189
  
    Those are my only comments; the examples look good.  Btw, it's OK this time, but in general, I'd recommend doing little cleanups in a separate PR.  Especially when lots of docs are being merged, it's really easy to hit merge conflicts.  Thanks!  I'll watch for updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163004364
  
    **[Test build #47363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47363/consoleFull)** for PR 10166 at commit [`9398743`](https://github.com/apache/spark/commit/9398743fdab872a15570ae3856d352713fbd4865).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162795582
  
    @jkbradley Thanks for reviewing, will take those comments into account.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163029963
  
    will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163009688
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47363/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162978022
  
    ```dev/lint-python``` should catch these issues


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162335094
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162335095
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47245/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162334282
  
    **[Test build #47245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47245/consoleFull)** for PR 10166 at commit [`10ba98a`](https://github.com/apache/spark/commit/10ba98ab9516e0193167a3f92e1c1aaf58e4602b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163009477
  
    **[Test build #47363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47363/consoleFull)** for PR 10166 at commit [`9398743`](https://github.com/apache/spark/commit/9398743fdab872a15570ae3856d352713fbd4865).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaIndexToStringExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163009686
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162997921
  
    **[Test build #47361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47361/consoleFull)** for PR 10166 at commit [`9591007`](https://github.com/apache/spark/commit/9591007c7d223e90233c13d99d9a6d2ccd1d92ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163013385
  
    @jkbradley Should I log a jira for completing the user guide on StringIndexer regarding the handling of missing labels @holdenk was talking about?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163019817
  
    If you wouldn't mind, that'd be great, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162970052
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47347/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162617615
  
    cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163000564
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47361/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162976307
  
    That was a spurious test failure; I asked it to retest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162619563
  
    That is what I was referring to, handling it in a follow up JIRA/PR seems ok too (just since one of the things blocking the original implementation was wanting to have it be user controllable if we allowed people to specify their own maps it seemed like good for that to also make it through to the docs).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162732590
  
    Thanks for the PR; I'll take a look now!
    @holdenk handleInvalid should be in a separate PR since it's for StringIndexer, but I agree it'd be nice to add to the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163000551
  
    **[Test build #47361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47361/consoleFull)** for PR 10166 at commit [`9591007`](https://github.com/apache/spark/commit/9591007c7d223e90233c13d99d9a6d2ccd1d92ef).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaIndexToStringExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10166#discussion_r46906261
  
    --- Diff: docs/ml-features.md ---
    @@ -951,9 +951,157 @@ indexed.show()
     </div>
     </div>
     
    +
    +## IndexToString
    +
    +Symmetrically to `StringIndexer`, `IndexToString` maps a column of label indices
    +back to a column containing the original labels as strings. The common use case
    +is to produce indices from labels with `StringIndexer`, train a model with those
    +indices and retrieve the original labels from the column of predicted indices
    +with `IndexToString`. However, you are free to supply your own labels.
    +
    +**Examples**
    +
    +Building on the `StringIndexer` example, let's assume we have the following
    +DataFrame with columns `id` and `categoryIndex`:
    +
    +~~~~
    + id | categoryIndex
    +----|---------------
    + 0  | 0.0
    + 1  | 2.0
    + 2  | 1.0
    + 3  | 0.0
    + 4  | 0.0
    + 5  | 1.0
    +~~~~
    +
    +Applying `IndexToString` with `categoryIndex` as the input column,
    +`originalCategory` as the output column and the previous `StringIndexer`'s
    +labels as labels, we are able to retrieve our original labels:
    +
    +~~~~
    + id | categoryIndex | originalCategory
    +----|---------------|-----------------
    + 0  | 0.0           | a
    + 1  | 2.0           | b
    + 2  | 1.0           | c
    + 3  | 0.0           | a
    + 4  | 0.0           | a
    + 5  | 1.0           | c
    +~~~~
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +
    +Refer to the [IndexToString Scala docs](api/scala/index.html#org.apache.spark.ml.feature.IndexToString)
    +for more details on the API.
    +
    +{% highlight scala %}
    +import org.apache.spark.ml.feature.{IndexToString, StringIndexer}
    +
    +val df = sqlContext.createDataFrame(Seq(
    +  (0, "a"),
    +  (1, "b"),
    +  (2, "c"),
    +  (3, "a"),
    +  (4, "a"),
    +  (5, "c")
    +)).toDF("id", "category")
    +
    +val indexer = new StringIndexer()
    +  .setInputCol("category")
    +  .setOutputCol("categoryIndex")
    +  .fit(df)
    +val indexed = indexer.transform(df)
    +
    +val converter = new IndexToString()
    +  .setInputCol("categoryIndex")
    +  .setOutputCol("originalCategory")
    +  .setLabels(indexer.labels)
    --- End diff --
    
    You probably don't need to specify labels; they should be pulled from column metadata.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163012760
  
    Merging with master and branch-1.6
    
    Thanks for the PR!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10166


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162386176
  
    It might be useful to also document the different ways "missing" labels can be handled - what are your thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162970048
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10166#discussion_r46906251
  
    --- Diff: docs/ml-features.md ---
    @@ -951,9 +951,157 @@ indexed.show()
     </div>
     </div>
     
    +
    +## IndexToString
    +
    +Symmetrically to `StringIndexer`, `IndexToString` maps a column of label indices
    +back to a column containing the original labels as strings. The common use case
    +is to produce indices from labels with `StringIndexer`, train a model with those
    +indices and retrieve the original labels from the column of predicted indices
    +with `IndexToString`. However, you are free to supply your own labels.
    +
    +**Examples**
    +
    +Building on the `StringIndexer` example, let's assume we have the following
    +DataFrame with columns `id` and `categoryIndex`:
    +
    +~~~~
    + id | categoryIndex
    +----|---------------
    + 0  | 0.0
    + 1  | 2.0
    + 2  | 1.0
    + 3  | 0.0
    + 4  | 0.0
    + 5  | 1.0
    +~~~~
    +
    +Applying `IndexToString` with `categoryIndex` as the input column,
    +`originalCategory` as the output column and the previous `StringIndexer`'s
    +labels as labels, we are able to retrieve our original labels:
    +
    +~~~~
    + id | categoryIndex | originalCategory
    +----|---------------|-----------------
    + 0  | 0.0           | a
    + 1  | 2.0           | b
    + 2  | 1.0           | c
    + 3  | 0.0           | a
    + 4  | 0.0           | a
    + 5  | 1.0           | c
    +~~~~
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +
    +Refer to the [IndexToString Scala docs](api/scala/index.html#org.apache.spark.ml.feature.IndexToString)
    +for more details on the API.
    +
    +{% highlight scala %}
    +import org.apache.spark.ml.feature.{IndexToString, StringIndexer}
    --- End diff --
    
    Would you mind moving these to examples/ and pulling the code snippets into here using the include_example functionality?  You can find examples of include_example in this .md file.  This makes the examples easier to test & maintain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162992917
  
    LGTM except for the Python style issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162335063
  
    **[Test build #47245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47245/consoleFull)** for PR 10166 at commit [`10ba98a`](https://github.com/apache/spark/commit/10ba98ab9516e0193167a3f92e1c1aaf58e4602b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by BenFradet <gi...@git.apache.org>.
Github user BenFradet commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162566145
  
    Hey @holdenk, thanks for reviewing.
    
    Do you mean regarding StringIndexer#setHandleInvalid method? If so, yes that'd be a good addition.
    
    However, I'm not sure if I should include it in this jira/pr or create another, input welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162976755
  
    **[Test build #2184 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2184/consoleFull)** for PR 10166 at commit [`cb33653`](https://github.com/apache/spark/commit/cb33653ec6f2bde4d5d7196d6eb3e8607d76fdb2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-163000560
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12159] [ML] Add user guide section for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10166#issuecomment-162977371
  
    **[Test build #2184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2184/consoleFull)** for PR 10166 at commit [`cb33653`](https://github.com/apache/spark/commit/cb33653ec6f2bde4d5d7196d6eb3e8607d76fdb2).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaIndexToStringExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org