You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/11/06 09:39:41 UTC

[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/19665

    [SPARK-22376][TESTS] Makes dev/run-tests.py script compatible with Python 3

    ## What changes were proposed in this pull request?
    
    This PR proposes to fix `dev/run-tests.py` script to support Python 3.
    
    Here are some backgrounds. Up to my knowledge, 
    
    In Python 2,
    - `unicode` is NOT `str` in Python 2 (`type("foo") != type(u"foo")`).
    - `str` has an alias, `bytes` in Python 2 (`type("foo") == type(b"foo")`).
    
    In Python 3,
    - `unicode` was (roughly) replaced by `str` in Python 3 (`type("foo") == type(u"foo")`).
    - `str` is NOT `bytes` in Python 3 (`type("foo") != type(b"foo")`).
    
    
    So, this PR fixes:
    
      1. Use `b''` instead of `''` so that both `str` in Python 2 and `bytes` in Python 3 can be hanlded. `sbt_proc.stdout.readline()` returns `str` (which has an alias, `bytes`) in Python 2 and `bytes` in Python 3
    
      2. Similarily, use `b''` instead of `''` so that both `str` in Python 2 and `bytes` in Python 3 can be hanlded. `re.compile` with `str` pattern does not seem supporting to match `bytes` in Python 3:
    
    Actually, this change is recommended up to my knowledge - https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data:
    
    > Mark all binary literals with a b prefix, textual literals with a u prefix
    
    ## How was this patch tested?
    
    I manually tested this via Python 3 with few additional changes to reduce the elapsed time.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-22376

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19665
    
----
commit 2c3d6f14a7f2884f5194475cf4f47bcdcb05b48e
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-11-06T09:13:53Z

    Makes dev/run-tests.py script compatible with Python 3

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19665#discussion_r149030792
  
    --- Diff: dev/run-tests.py ---
    @@ -276,9 +276,9 @@ def exec_sbt(sbt_args=()):
     
         sbt_cmd = [os.path.join(SPARK_HOME, "build", "sbt")] + sbt_args
     
    -    sbt_output_filter = re.compile("^.*[info].*Resolving" + "|" +
    -                                   "^.*[warn].*Merging" + "|" +
    -                                   "^.*[info].*Including")
    +    sbt_output_filter = re.compile(b"^.*[info].*Resolving" + b"|" +
    +                                   b"^.*[warn].*Merging" + b"|" +
    +                                   b"^.*[info].*Including")
    --- End diff --
    
    In Python 3:
    
    ```python
    import re
    print(re.compile(b"a").match("a"))
    print(re.compile("a").match("a"))
    print(re.compile("a").match(b"a"))
    print(re.compile(b"a").match(b"a"))
    ```
    
    ```python
    >>> import re
    >>> print(re.compile(b"a").match("a"))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: cannot use a bytes pattern on a string-like object
    >>> print(re.compile("a").match("a"))
    <_sre.SRE_Match object; span=(0, 1), match='a'>
    >>> print(re.compile("a").match(b"a"))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: cannot use a string pattern on a bytes-like object
    >>> print(re.compile(b"a").match(b"a"))
    <_sre.SRE_Match object; span=(0, 1), match=b'a'>
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19665
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19665
  
    **[Test build #83479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83479/testReport)** for PR 19665 at commit [`2c3d6f1`](https://github.com/apache/spark/commit/2c3d6f14a7f2884f5194475cf4f47bcdcb05b48e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19665
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19665
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83479/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19665#discussion_r149030699
  
    --- Diff: dev/run-tests.py ---
    @@ -289,7 +289,7 @@ def exec_sbt(sbt_args=()):
                                     stdin=echo_proc.stdout,
                                     stdout=subprocess.PIPE)
         echo_proc.wait()
    -    for line in iter(sbt_proc.stdout.readline, ''):
    +    for line in iter(sbt_proc.stdout.readline, b''):
    --- End diff --
    
    
    This previous code causes an infinite loop in Python 3 because `''` is `str`; however, `sbt_proc.stdout.readline()` returns `b''`, `bytes` at the end:
    
    This can be tested as below:
    
    ```python
    import subprocess
    sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
    print(type(sbt_proc.stdout.readline()))
    ```
    
    In Python 2:
    
    ```
    >>> import subprocess
    >>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
    >>> print(type(sbt_proc.stdout.readline()))
    <type 'str'>
    ```
    
    In Python 3:
    
    ```
    >>> import subprocess
    >>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
    >>> print(type(sbt_proc.stdout.readline()))
    <class 'bytes'>
    ```
    
    however,
    
    In Python 2:
    
    ```python
    >>> b'' == ''
    True
    >>> print(type(b''), type(''))
    (<type 'str'>, <type 'str'>)
    ```
    
    In Python 3:
    
    ```python
    >>> b'' == ''
    False
    >>> print(type(b''), type(''))
    <class 'bytes'> <class 'str'>
    ```
    
    The infinite loop can be tested as below, in Python 3:
    
    ```python
    import subprocess
    
    sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
    for line in iter(sbt_proc.stdout.readline, ''):
        print(line)
    ```
    
    In Python 2, the codes above does not cause the infinite loop. This is also fine if we use `b''` for the sentinel, because `bytes` is an alias for `str` in Python 2.
    
    ```python
    import subprocess
    
    sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
    for line in iter(sbt_proc.stdout.readline, b''):
        print(line)
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19665
  
    **[Test build #83479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83479/testReport)** for PR 19665 at commit [`2c3d6f1`](https://github.com/apache/spark/commit/2c3d6f14a7f2884f5194475cf4f47bcdcb05b48e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19665


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19665
  
    Thank you @srowen for reviewing this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org