You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/11/06 09:39:41 UTC
[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/19665
[SPARK-22376][TESTS] Makes dev/run-tests.py script compatible with Python 3
## What changes were proposed in this pull request?
This PR proposes to fix `dev/run-tests.py` script to support Python 3.
Here are some backgrounds. Up to my knowledge,
In Python 2,
- `unicode` is NOT `str` in Python 2 (`type("foo") != type(u"foo")`).
- `str` has an alias, `bytes` in Python 2 (`type("foo") == type(b"foo")`).
In Python 3,
- `unicode` was (roughly) replaced by `str` in Python 3 (`type("foo") == type(u"foo")`).
- `str` is NOT `bytes` in Python 3 (`type("foo") != type(b"foo")`).
So, this PR fixes:
1. Use `b''` instead of `''` so that both `str` in Python 2 and `bytes` in Python 3 can be hanlded. `sbt_proc.stdout.readline()` returns `str` (which has an alias, `bytes`) in Python 2 and `bytes` in Python 3
2. Similarily, use `b''` instead of `''` so that both `str` in Python 2 and `bytes` in Python 3 can be hanlded. `re.compile` with `str` pattern does not seem supporting to match `bytes` in Python 3:
Actually, this change is recommended up to my knowledge - https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data:
> Mark all binary literals with a b prefix, textual literals with a u prefix
## How was this patch tested?
I manually tested this via Python 3 with few additional changes to reduce the elapsed time.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-22376
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19665.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19665
----
commit 2c3d6f14a7f2884f5194475cf4f47bcdcb05b48e
Author: hyukjinkwon <gu...@gmail.com>
Date: 2017-11-06T09:13:53Z
Makes dev/run-tests.py script compatible with Python 3
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19665#discussion_r149030792
--- Diff: dev/run-tests.py ---
@@ -276,9 +276,9 @@ def exec_sbt(sbt_args=()):
sbt_cmd = [os.path.join(SPARK_HOME, "build", "sbt")] + sbt_args
- sbt_output_filter = re.compile("^.*[info].*Resolving" + "|" +
- "^.*[warn].*Merging" + "|" +
- "^.*[info].*Including")
+ sbt_output_filter = re.compile(b"^.*[info].*Resolving" + b"|" +
+ b"^.*[warn].*Merging" + b"|" +
+ b"^.*[info].*Including")
--- End diff --
In Python 3:
```python
import re
print(re.compile(b"a").match("a"))
print(re.compile("a").match("a"))
print(re.compile("a").match(b"a"))
print(re.compile(b"a").match(b"a"))
```
```python
>>> import re
>>> print(re.compile(b"a").match("a"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot use a bytes pattern on a string-like object
>>> print(re.compile("a").match("a"))
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> print(re.compile("a").match(b"a"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot use a string pattern on a bytes-like object
>>> print(re.compile(b"a").match(b"a"))
<_sre.SRE_Match object; span=(0, 1), match=b'a'>
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19665
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19665
**[Test build #83479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83479/testReport)** for PR 19665 at commit [`2c3d6f1`](https://github.com/apache/spark/commit/2c3d6f14a7f2884f5194475cf4f47bcdcb05b48e).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19665
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19665
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83479/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19665#discussion_r149030699
--- Diff: dev/run-tests.py ---
@@ -289,7 +289,7 @@ def exec_sbt(sbt_args=()):
stdin=echo_proc.stdout,
stdout=subprocess.PIPE)
echo_proc.wait()
- for line in iter(sbt_proc.stdout.readline, ''):
+ for line in iter(sbt_proc.stdout.readline, b''):
--- End diff --
This previous code causes an infinite loop in Python 3 because `''` is `str`; however, `sbt_proc.stdout.readline()` returns `b''`, `bytes` at the end:
This can be tested as below:
```python
import subprocess
sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
print(type(sbt_proc.stdout.readline()))
```
In Python 2:
```
>>> import subprocess
>>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
>>> print(type(sbt_proc.stdout.readline()))
<type 'str'>
```
In Python 3:
```
>>> import subprocess
>>> sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
>>> print(type(sbt_proc.stdout.readline()))
<class 'bytes'>
```
however,
In Python 2:
```python
>>> b'' == ''
True
>>> print(type(b''), type(''))
(<type 'str'>, <type 'str'>)
```
In Python 3:
```python
>>> b'' == ''
False
>>> print(type(b''), type(''))
<class 'bytes'> <class 'str'>
```
The infinite loop can be tested as below, in Python 3:
```python
import subprocess
sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
for line in iter(sbt_proc.stdout.readline, ''):
print(line)
```
In Python 2, the codes above does not cause the infinite loop. This is also fine if we use `b''` for the sentinel, because `bytes` is an alias for `str` in Python 2.
```python
import subprocess
sbt_proc = subprocess.Popen(["ls"], stdout=subprocess.PIPE)
for line in iter(sbt_proc.stdout.readline, b''):
print(line)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19665
**[Test build #83479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83479/testReport)** for PR 19665 at commit [`2c3d6f1`](https://github.com/apache/spark/commit/2c3d6f14a7f2884f5194475cf4f47bcdcb05b48e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py scrip...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19665
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19665: [SPARK-22376][TESTS] Makes dev/run-tests.py script compa...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19665
Thank you @srowen for reviewing this.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org