You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2016/06/14 01:01:11 UTC

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/13655

    [SPARK-15935][PySpark]Enable test for sql/streaming.py and fix these tests

    ## What changes were proposed in this pull request?
    
    This PR just enables tests for sql/streaming.py and also fixes the failures.
    
    ## How was this patch tested?
    
    Existing unit tests.
    
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark python-streaming-test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13655
    
----
commit 0b15d88b0cde790e9aa58eb34aa22c57642b8886
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-06-14T01:00:13Z

    Enable test for sql/streaming.py and fix these tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by brkyvz <gi...@git.apache.org>.

Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13655#discussion_r66895718
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -106,28 +114,37 @@ def active(self):
             """Returns a list of active queries associated with this SQLContext
     
             >>> cq = df.write.format('memory').queryName('this_query').startStream()
    -        >>> cqm = sqlContext.streams
    +        >>> cqm = spark.streams
             >>> # get the list of active continuous queries
             >>> [q.name for q in cqm.active]
             [u'this_query']
             >>> cq.stop()
             """
             return [ContinuousQuery(jcq) for jcq in self._jcqm.active()]
     
    +    @ignore_unicode_prefix
         @since(2.0)
    -    def get(self, name):
    +    def get(self, id):
             """Returns an active query from this SQLContext or throws exception if an active query
             with this name doesn't exist.
     
    -        >>> df.write.format('memory').queryName('this_query').startStream()
    -        >>> cq = sqlContext.streams.get('this_query')
    +        >>> cq = df.write.format('memory').queryName('this_query').startStream()
    +        >>> cq.name
    +        u'this_query'
    +        >>> cq = spark.streams.get(cq.id)
             >>> cq.isActive
             True
             >>> cq.stop()
             """
    -        if type(name) != str or len(name.strip()) == 0:
    -            raise ValueError("The name for the query must be a non-empty string. Got: %s" % name)
    -        return ContinuousQuery(self._jcqm.get(name))
    +        import sys
    +        if sys.version >= '3':
    --- End diff --
    
    can we define this in the top of the file? Maybe it can be re-used?
    
    ```python
    import sys
    if sys.version >= '3':
      intlike = int
    else:
      intlike = (int, long)
    
    ...
    
    if not isinstance(id, intlike):
      raise Error(...)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by brkyvz <gi...@git.apache.org>.

Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13655#discussion_r67002254
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -106,28 +120,34 @@ def active(self):
             """Returns a list of active queries associated with this SQLContext
     
             >>> cq = df.write.format('memory').queryName('this_query').startStream()
    -        >>> cqm = sqlContext.streams
    +        >>> cqm = spark.streams
             >>> # get the list of active continuous queries
             >>> [q.name for q in cqm.active]
             [u'this_query']
             >>> cq.stop()
             """
             return [ContinuousQuery(jcq) for jcq in self._jcqm.active()]
     
    +    @ignore_unicode_prefix
         @since(2.0)
    -    def get(self, name):
    +    def get(self, id):
             """Returns an active query from this SQLContext or throws exception if an active query
             with this name doesn't exist.
     
    -        >>> df.write.format('memory').queryName('this_query').startStream()
    -        >>> cq = sqlContext.streams.get('this_query')
    +        >>> cq = df.write.format('memory').queryName('this_query').startStream()
    +        >>> cq.name
    +        u'this_query'
    +        >>> cq = spark.streams.get(cq.id)
    +        >>> cq.isActive
    +        True
    +        >>> cq = sqlContext.streams.get(cq.id)
             >>> cq.isActive
             True
             >>> cq.stop()
             """
    -        if type(name) != str or len(name.strip()) == 0:
    -            raise ValueError("The name for the query must be a non-empty string. Got: %s" % name)
    -        return ContinuousQuery(self._jcqm.get(name))
    +        if not isinstance(id, intlike):
    +            raise ValueError("The id for the query must be an integer. Got: %d" % id)
    --- End diff --
    
    just noticed this. I would use a `%s`, otherwise you're trying to format something that's not a number as a number. But we entered this field because it was likely not a number. Or does it work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    Merging this to master and 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    **[Test build #60457 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60457/consoleFull)** for PR 13655 at commit [`0b15d88`](https://github.com/apache/spark/commit/0b15d88b0cde790e9aa58eb34aa22c57642b8886).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    /cc @tdas @brkyvz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by brkyvz <gi...@git.apache.org>.

Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13655#discussion_r66895475
  
    --- Diff: dev/sparktestsupport/modules.py ---
    @@ -337,6 +337,7 @@ def __hash__(self):
             "pyspark.sql.group",
             "pyspark.sql.functions",
             "pyspark.sql.readwriter",
    +        "pyspark.sql.streaming",
    --- End diff --
    
    this is pretty significant. I had no idea this file existed. I wonder if we could possibly add a check to the linter to catch new Python files appearing but not being added to this file? It's not a blocker message but an advisory one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13655#discussion_r66897860
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -549,6 +549,15 @@ def read(self):
             """
             return DataFrameReader(self._wrapped)
     
    +    @property
    +    @since(2.0)
    +    def streams(self):
    +        """Returns a :class:`ContinuousQueryManager` that allows managing all the
    +        :class:`ContinuousQuery` ContinuousQueries active on `this` context.
    --- End diff --
    
    add experimental tag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60465/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    **[Test build #60465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60465/consoleFull)** for PR 13655 at commit [`88d5b93`](https://github.com/apache/spark/commit/88d5b93e9b0ea2ebbf5ae62a7c9b32aa6357a647).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by brkyvz <gi...@git.apache.org>.

Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    a minor nit, otherwise LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    **[Test build #60465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60465/consoleFull)** for PR 13655 at commit [`88d5b93`](https://github.com/apache/spark/commit/88d5b93e9b0ea2ebbf5ae62a7c9b32aa6357a647).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60457/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by zsxwing <gi...@git.apache.org>.

Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13655#discussion_r67014140
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -106,28 +120,34 @@ def active(self):
             """Returns a list of active queries associated with this SQLContext
     
             >>> cq = df.write.format('memory').queryName('this_query').startStream()
    -        >>> cqm = sqlContext.streams
    +        >>> cqm = spark.streams
             >>> # get the list of active continuous queries
             >>> [q.name for q in cqm.active]
             [u'this_query']
             >>> cq.stop()
             """
             return [ContinuousQuery(jcq) for jcq in self._jcqm.active()]
     
    +    @ignore_unicode_prefix
         @since(2.0)
    -    def get(self, name):
    +    def get(self, id):
             """Returns an active query from this SQLContext or throws exception if an active query
             with this name doesn't exist.
     
    -        >>> df.write.format('memory').queryName('this_query').startStream()
    -        >>> cq = sqlContext.streams.get('this_query')
    +        >>> cq = df.write.format('memory').queryName('this_query').startStream()
    +        >>> cq.name
    +        u'this_query'
    +        >>> cq = spark.streams.get(cq.id)
    +        >>> cq.isActive
    +        True
    +        >>> cq = sqlContext.streams.get(cq.id)
             >>> cq.isActive
             True
             >>> cq.stop()
             """
    -        if type(name) != str or len(name.strip()) == 0:
    -            raise ValueError("The name for the query must be a non-empty string. Got: %s" % name)
    -        return ContinuousQuery(self._jcqm.get(name))
    +        if not isinstance(id, intlike):
    +            raise ValueError("The id for the query must be an integer. Got: %d" % id)
    --- End diff --
    
    Good catch. I will submit a separate PR to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by tdas <gi...@git.apache.org>.

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13655#discussion_r66898042
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -209,27 +226,27 @@ def _test():
         import doctest
         import os
         import tempfile
    -    from pyspark.context import SparkContext
    -    from pyspark.sql import Row, SQLContext, HiveContext
    -    import pyspark.sql.readwriter
    +    from pyspark.sql import Row, SparkSession, HiveContext
    +    import pyspark.sql.streaming
     
         os.chdir(os.environ["SPARK_HOME"])
     
    -    globs = pyspark.sql.readwriter.__dict__.copy()
    -    sc = SparkContext('local[4]', 'PythonTest')
    +    globs = pyspark.sql.streaming.__dict__.copy()
    +    try:
    +        spark = SparkSession.builder.enableHiveSupport().getOrCreate()
    +    except py4j.protocol.Py4JError:
    +        spark = SparkSession(sc)
     
         globs['tempfile'] = tempfile
         globs['os'] = os
    -    globs['sc'] = sc
    -    globs['sqlContext'] = SQLContext(sc)
    -    globs['hiveContext'] = HiveContext._createForTesting(sc)
    +    globs['spark'] = spark
         globs['df'] = \
    -        globs['sqlContext'].read.format('text').stream('python/test_support/sql/streaming')
    +        globs['spark'].read.format('text').stream('python/test_support/sql/streaming')
    --- End diff --
    
    there should be some tests for sqlcontext as well, to make sure something fails if it gets removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13655
  
    **[Test build #60457 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60457/consoleFull)** for PR 13655 at commit [`0b15d88`](https://github.com/apache/spark/commit/0b15d88b0cde790e9aa58eb34aa22c57642b8886).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13655


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org