You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rekhajoshm <gi...@git.apache.org> on 2015/11/03 19:33:48 UTC

[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

GitHub user rekhajoshm opened a pull request:

    https://github.com/apache/spark/pull/9440

    [Spark-11478] [ML] ML StringIndexer return inconsistent schema

    ```val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2)
    val df = sqlContext.createDataFrame(data).toDF("id", "label")
    val indexer = new StringIndexer()
      .setInputCol("label")
      .setOutputCol("labelIndex")
      .fit(df)
    val transformed = indexer.transform(df)
    
    println(transformed.schema.toString())
    println(indexer.transformSchema(df.schema))
    ```
    Verified that two print of transformed schema return different nullable
    StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true))
    
    StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false))

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rekhajoshm/spark SPARK-11478

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9440.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9440
    
----
commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-05-05T23:10:08Z

    Merge pull request #1 from apache/master
    
    Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-05-08T21:49:09Z

    Merge pull request #2 from apache/master
    
    pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-06-22T00:08:08Z

    Merge pull request #3 from apache/master
    
    Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi <re...@gmail.com>
Date:   2015-09-17T01:03:09Z

    Merge pull request #4 from apache/master
    
    Pulling functionality from apache spark

commit eae53fb16dccdb4eb072466cae2429083461e406
Author: Joshi <re...@gmail.com>
Date:   2015-11-03T18:34:12Z

    fix for ML StringIndexer inconsistent schema

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-209173348
  
      [Test build #2781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2781/console) for   PR 9440 at commit [`eae53fb`](https://github.com/apache/spark/commit/eae53fb16dccdb4eb072466cae2429083461e406).
     * This patch **fails R style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153448397
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153449189
  
    **[Test build #44942 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44942/consoleFull)** for PR 9440 at commit [`eae53fb`](https://github.com/apache/spark/commit/eae53fb16dccdb4eb072466cae2429083461e406).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153458079
  
    **[Test build #44942 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44942/consoleFull)** for PR 9440 at commit [`eae53fb`](https://github.com/apache/spark/commit/eae53fb16dccdb4eb072466cae2429083461e406).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153512622
  
    cc @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153458137
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153629098
  
    I think the patch is not appropriate because it does not pass regression test and will produce error in other test cases. I have found the cause of this bug, but not figure out a way to resolve it. Please see my comments at [SPARK-11478](https://issues.apache.org/jira/browse/SPARK-11478). I think disable ```nullable``` check is a workaround, looking forward to others' opinions. @rekhajoshm @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-209172874
  
      [Test build #2781 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2781/consoleFull) for   PR 9440 at commit [`eae53fb`](https://github.com/apache/spark/commit/eae53fb16dccdb4eb072466cae2429083461e406).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153458142
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44942/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by rekhajoshm <gi...@git.apache.org>.
Github user rekhajoshm commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153831974
  
    Thanks @yanboliang for your comments.My findings were similar to yours, and that nullable is the cause, driven by attr.toStructField().This was a few secs quick look/pull. Agree it needs more conversation. @mengxr 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9440: [Spark-11478] [ML] ML StringIndexer return inconsistent s...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/9440
  
    Is this still active?  If not, can you please close this issue pending discussion on the JIRA?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Spark-11478] [ML] ML StringIndexer return inc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9440#issuecomment-153448422
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9440: [Spark-11478] [ML] ML StringIndexer return inconsi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9440


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org