You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by squito <gi...@git.apache.org> on 2018/08/27 18:17:35 UTC
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/22247
[SPARK-25253][PYSPARK] Refactor local connection & auth code
This eliminates some duplication in the code to connect to a server on localhost to talk directly to the jvm. Also it gives consistent ipv6 and error handling. Two other incidental changes, that shouldn't matter:
1) python barrier tasks perform authentication immediately (rather than waiting for the BARRIER_FUNCTION indicator)
2) for `rdd._load_from_socket`, the timeout is only increased after authentication.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark py_connection_refactor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22247.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22247
----
commit c232ec63f80eea05d3756feb22e53aa5a1e67d93
Author: Imran Rashid <ir...@...>
Date: 2018-08-27T17:07:44Z
[SPARK-25253][PYSPARK] Refactor local connection & auth code
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95380/testReport)** for PR 22247 at commit [`65ed777`](https://github.com/apache/spark/commit/65ed777d11f1047bdb371259dc44969dad6ca0ac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213398017
--- Diff: python/pyspark/taskcontext.py ---
@@ -108,38 +108,12 @@ def _load_from_socket(port, auth_secret):
"""
Load data from a given socket, this is a blocking method thus only return when the socket
connection has been closed.
-
- This is copied from context.py, while modified the message protocol.
"""
- sock = None
- # Support for both IPv4 and IPv6.
- # On most of IPv6-ready systems, IPv6 will take precedence.
- for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
- af, socktype, proto, canonname, sa = res
- sock = socket.socket(af, socktype, proto)
- try:
- # Do not allow timeout for socket reading operation.
- sock.settimeout(None)
- sock.connect(sa)
- except socket.error:
- sock.close()
- sock = None
- continue
- break
- if not sock:
- raise Exception("could not open socket")
-
- # We don't really need a socket file here, it's just for convenience that we can reuse the
- # do_server_auth() function and data serialization methods.
- sockfile = sock.makefile("rwb", 65536)
-
+ (sockfile, sock) = local_connect_and_auth(port, auth_secret)
--- End diff --
good catch, thanks! updated
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213324531
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
raise Exception("Unexpected reply from iterator server.")
+def local_connect_and_auth(sock_info):
--- End diff --
done, thanks -- I used the varargs version `*sock_info` for the last one, i forgot about that in python.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95365/testReport)** for PR 22247 at commit [`b0c4483`](https://github.com/apache/spark/commit/b0c4483e9506d8af6f12b6d41848ece52ffe9bc4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213400676
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
raise Exception("Unexpected reply from iterator server.")
+def local_connect_and_auth(port, auth_secret):
+ """
+ Connect to local host, authenticate with it, and return a (sockfile,sock) for that connection.
+ Handles IPV4 & IPV6, does some error handling.
+ :param port
+ :param auth_secret
+ :return: a tuple with (sockfile, sock)
+ """
+ sock = None
+ errors = []
+ # Support for both IPv4 and IPv6.
+ # On most of IPv6-ready systems, IPv6 will take precedence.
+ for res in socket.getaddrinfo("127.0.0.1", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
+ af, socktype, proto, _, sa = res
+ sock = socket.socket(af, socktype, proto)
+ try:
+ sock.settimeout(15)
+ sock.connect(sa)
+ except socket.error as e:
+ emsg = _exception_message(e)
+ errors.append("tried to connect to %s, but an error occured: %s" % (sa, emsg))
+ sock.close()
+ sock = None
+ continue
+ break
--- End diff --
Slight shorter (and more "python-compliant"?):
- move the socket initialization (and the return) inside the try
- get rid of the continue
- use an else instead of the condition below
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2582/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95303/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95380/testReport)** for PR 22247 at commit [`65ed777`](https://github.com/apache/spark/commit/65ed777d11f1047bdb371259dc44969dad6ca0ac).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95354/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on the issue:
https://github.com/apache/spark/pull/22247
@jiangxb1987 @HyukjinKwon @mengxr
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95354/testReport)** for PR 22247 at commit [`d07d21d`](https://github.com/apache/spark/commit/d07d21d83516492155207c06dfbde3a171412a68).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213181568
--- Diff: python/pyspark/worker.py ---
@@ -364,8 +364,5 @@ def process():
# Read information about how to connect back to the JVM from the environment.
java_port = int(os.environ["PYTHON_WORKER_FACTORY_PORT"])
auth_secret = os.environ["PYTHON_WORKER_FACTORY_SECRET"]
- sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- sock.connect(("127.0.0.1", java_port))
- sock_file = sock.makefile("rwb", 65536)
--- End diff --
I quickly tested and seems working fine. Please ignore this comment.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213174542
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
raise Exception("Unexpected reply from iterator server.")
+def local_connect_and_auth(sock_info):
+ """
+ Connect to local host, authenticate with it, and return a (sockfile,sock) for that connection.
+ Handles IPV4 & IPV6, does some error handling.
+ :param sock_info: a tuple of (port, auth_secret) for connecting
+ :return: a tuple with (sockfile, sock)
+ """
+ port, auth_secret = sock_info
+ sock = None
+ errors = []
+ # Support for both IPv4 and IPv6.
+ # On most of IPv6-ready systems, IPv6 will take precedence.
+ for res in socket.getaddrinfo("127.0.0.1", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
+ af, socktype, proto, canonname, sa = res
--- End diff --
nit: `af, socktype, proto, canonname, sa = res` -> `af, socktype, proto, _, sa = res`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95303/testReport)** for PR 22247 at commit [`c232ec6`](https://github.com/apache/spark/commit/c232ec63f80eea05d3756feb22e53aa5a1e67d93).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95354/testReport)** for PR 22247 at commit [`d07d21d`](https://github.com/apache/spark/commit/d07d21d83516492155207c06dfbde3a171412a68).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213173845
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
raise Exception("Unexpected reply from iterator server.")
+def local_connect_and_auth(sock_info):
--- End diff --
@squito, not a big deal but how about `local_connect_and_auth (port, auth_secret)` and ..
```python
(sockfile, sock) = local_connect_and_auth(port, auth_secret)
```
```python
(sock_file, _) = local_connect_and_auth(java_port, auth_secret)
```
```python
port, auth_secret = sock_info
(sockfile, sock) = local_connect_and_auth(port, auth_secret)
```
or
```python
(sockfile, sock) = local_connect_and_auth(*sock_info)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2621/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/22247
@squito Thanks for the refactor!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95365/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22247
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2642/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95365/testReport)** for PR 22247 at commit [`b0c4483`](https://github.com/apache/spark/commit/b0c4483e9506d8af6f12b6d41848ece52ffe9bc4).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2631/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95380/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22247
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213378931
--- Diff: python/pyspark/taskcontext.py ---
@@ -108,38 +108,12 @@ def _load_from_socket(port, auth_secret):
"""
Load data from a given socket, this is a blocking method thus only return when the socket
connection has been closed.
-
- This is copied from context.py, while modified the message protocol.
"""
- sock = None
- # Support for both IPv4 and IPv6.
- # On most of IPv6-ready systems, IPv6 will take precedence.
- for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
- af, socktype, proto, canonname, sa = res
- sock = socket.socket(af, socktype, proto)
- try:
- # Do not allow timeout for socket reading operation.
- sock.settimeout(None)
- sock.connect(sa)
- except socket.error:
- sock.close()
- sock = None
- continue
- break
- if not sock:
- raise Exception("could not open socket")
-
- # We don't really need a socket file here, it's just for convenience that we can reuse the
- # do_server_auth() function and data serialization methods.
- sockfile = sock.makefile("rwb", 65536)
-
+ (sockfile, sock) = local_connect_and_auth(port, auth_secret)
--- End diff --
We must set sock timeout to `None` to allow `barrier()` call blocking forever.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213175296
--- Diff: python/pyspark/worker.py ---
@@ -364,8 +364,5 @@ def process():
# Read information about how to connect back to the JVM from the environment.
java_port = int(os.environ["PYTHON_WORKER_FACTORY_PORT"])
auth_secret = os.environ["PYTHON_WORKER_FACTORY_SECRET"]
- sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- sock.connect(("127.0.0.1", java_port))
- sock_file = sock.makefile("rwb", 65536)
--- End diff --
@vanzin, BTW, did you test this on Windows too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22247
**[Test build #95303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95303/testReport)** for PR 22247 at commit [`c232ec6`](https://github.com/apache/spark/commit/c232ec63f80eea05d3756feb22e53aa5a1e67d93).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22247: [SPARK-25253][PYSPARK] Refactor local connection & auth ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22247
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22247: [SPARK-25253][PYSPARK] Refactor local connection ...
Posted by squito <gi...@git.apache.org>.
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/22247#discussion_r213469645
--- Diff: python/pyspark/java_gateway.py ---
@@ -147,6 +147,39 @@ def do_server_auth(conn, auth_secret):
raise Exception("Unexpected reply from iterator server.")
+def local_connect_and_auth(port, auth_secret):
+ """
+ Connect to local host, authenticate with it, and return a (sockfile,sock) for that connection.
+ Handles IPV4 & IPV6, does some error handling.
+ :param port
+ :param auth_secret
+ :return: a tuple with (sockfile, sock)
+ """
+ sock = None
+ errors = []
+ # Support for both IPv4 and IPv6.
+ # On most of IPv6-ready systems, IPv6 will take precedence.
+ for res in socket.getaddrinfo("127.0.0.1", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
+ af, socktype, proto, _, sa = res
+ sock = socket.socket(af, socktype, proto)
+ try:
+ sock.settimeout(15)
+ sock.connect(sa)
+ except socket.error as e:
+ emsg = _exception_message(e)
+ errors.append("tried to connect to %s, but an error occured: %s" % (sa, emsg))
+ sock.close()
+ sock = None
+ continue
+ break
--- End diff --
done
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org