You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/14 23:15:30 UTC
[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994128564
@gengliangwang -- I actually have a very simple reproducer using nginx as a reverse proxy and not jupyterhub (to eliminate that failure mode). The following script will set up the proxy, note that it redirects `/user/PerilousApricot/proxy/4040/` to the root of the spark webUI
**proxy-fail.sh**
```bash
#!/bin/bash
cat << \EOT > nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '[$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_x_forwarded_for"';
access_log /dev/stdout main;
server {
listen 5050;
server_name localhost;
location /user/PerilousApricot/proxy/4040/ {
error_log /dev/stderr debug;
proxy_pass http://localhost:4040/;
proxy_pass_header Content-Type;
}
}
}
EOT
docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
```
Run that proxy in one terminal, then run pyspark:
```
SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/proxy/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/proxy/4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/proxy/4040/ --conf spark.app.name=proxyApp
```
Open `http://localhost:5050/user/PerilousApricot/proxy/4040/executors/` in a browser with "developer mode" enabled to watch the traffic come by. You will see a number of successful requests to various resources like:
```
http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.css
http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.js
```
Notice, however that there is a failed request (and the reason of this PR) -
```
http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
```
If you run curl manually on both that URL, you can see that it fails both at the reverse proxy and at the actual webui itself:
```
curl -v -o /dev/null http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
curl -v -o /dev/null http://localhost:4040/api/v1/applications/4040/allexecutors
```
But if you copy-paste the appId from the spark console (in my case I have: `Spark context available as 'sc' (master = local[*], app id = local-1639522961946).`), the following two requests succeed:
```
curl http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/local-1639522961946
curl -v -o /dev/null http://localhost:4040/api/v1/applications/local-1639522961946
```
To confirm the issue, let's restart the proxy and pyspark, but instead of proxying `/user/PerilousApricot/proxy/4040/`, let's instead proxy to `/user/PerilousApricot/yxorp/4040/` (note that there is no "proxy" in the proxied URL). First execute
**proxy-win.sh**
```
#!/bin/bash
cat << \EOT > nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '[$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_x_forwarded_for"';
access_log /dev/stdout main;
server {
listen 5050;
server_name localhost;
location /user/PerilousApricot/yxorp/4040/ {
#error_log /dev/stderr debug;
proxy_pass http://localhost:4040/;
#proxy_redirect off;
proxy_pass_header Content-Type;
#rewrite /user/PerilousApricot/yxorp/4040(/.*|$) $1 break;
}
}
}
EOT
docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
```
and then run in a different terminal
```
SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/yxorp/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/yxorp//4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/yxorp/4040/ --conf spark.app.name=proxyApp
```
Open `http://localhost:5050/user/PerilousApricot/yxorp/4040//executors/` and you can see that the page renders properly. Looking at the development console, you will see that instead of attempting to open
```
http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
```
this version requests the status of the executors from
```
http://localhost:5050/user/PerilousApricot/yxorp/4040//api/v1/applications/local-1639523380430/allexecutors
```
I hope this is enough to show that @ornew did the right analysis -- Th fault isn't with jupyterhub, it is simply the fact that the logic that tries to look up the appId chokes if there is a path element named "proxy" in the URL.
Can you please re-examine this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org