You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by 0x0FFF <gi...@git.apache.org> on 2015/09/02 16:36:54 UTC

[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

GitHub user 0x0FFF opened a pull request:

    https://github.com/apache/spark/pull/8574

    [SPARK-10417][SQL] Iterating through Column results in infinite loop

    `pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance)
    
    Issue reproduction:
    ```
    df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))
    for i in df["name"]: print i
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/0x0FFF/spark SPARK-10417

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8574
    
----
commit ea2e9d4e5e1abf7c4913ad33cb89424f444b80b7
Author: 0x0FFF <pr...@gmail.com>
Date:   2015-09-02T14:27:04Z

    [SPARK-10417][SQL] Iterating through Column results in infinite loop

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137175422
  
      [Test build #1714 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1714/consoleFull) for   PR 8574 at commit [`f041635`](https://github.com/apache/spark/commit/f041635e2478139779561d47ea6bbacbb68670cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by 0x0FFF <gi...@git.apache.org>.
Github user 0x0FFF commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137128350
  
    @cloud-fan, I addressed your comments with last commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8574


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by 0x0FFF <gi...@git.apache.org>.
Github user 0x0FFF commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137165930
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137113923
  
      [Test build #1712 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1712/consoleFull) for   PR 8574 at commit [`ea2e9d4`](https://github.com/apache/spark/commit/ea2e9d4e5e1abf7c4913ad33cb89424f444b80b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137105686
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by 0x0FFF <gi...@git.apache.org>.
Github user 0x0FFF commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137142934
  
    Looks like it's not being retested after the last commit as Jenkins failed to update the status and the dashboard shows that it's still running. Am I right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8574#discussion_r38542327
  
    --- Diff: python/pyspark/sql/column.py ---
    @@ -226,6 +226,10 @@ def __getattr__(self, item):
                 raise AttributeError(item)
             return self.getField(item)
     
    +    def __iter__(self):
    +        raise TypeError('%s.%s object is not iterable' % (self.__module__,
    --- End diff --
    
    I think "Column is not iterable" is a good enough error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137113194
  
    Don't know much about Python myself but that sounds convincing. CC @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137125402
  
      [Test build #1712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1712/console) for   PR 8574 at commit [`ea2e9d4`](https://github.com/apache/spark/commit/ea2e9d4e5e1abf7c4913ad33cb89424f444b80b7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaTrainValidationSplitExample `
      * `class KMeans @Since("1.5.0") (`
      * `class DCT(JavaTransformer, HasInputCol, HasOutputCol):`
      * `class SQLTransformer(JavaTransformer):`
      * `class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):`
      * `case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode `
      * `case class UnionNode(children: Seq[LocalNode]) extends LocalNode `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137237249
  
    Merged into master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137175156
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8574#discussion_r38542172
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -1066,6 +1066,16 @@ def test_with_column_with_existing_name(self):
             keys = self.df.withColumn("key", self.df.key).select("key").collect()
             self.assertEqual([r.key for r in keys], list(range(100)))
     
    +    # regression test for SPARK-10417
    +    def test_column_iterator(self):
    +        # Catch exception raised during improper construction
    +        try:
    +            for x in self.df.key:
    +                break
    +            self.assertEqual(0, 1)
    +        except TypeError:
    +            self.assertEqual(1, 1)
    --- End diff --
    
    you can use `assertRaises` to test the exception case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10417][SQL] Iterating through Column re...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8574#issuecomment-137182108
  
      [Test build #1714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1714/console) for   PR 8574 at commit [`f041635`](https://github.com/apache/spark/commit/f041635e2478139779561d47ea6bbacbb68670cf).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org