You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/08 03:23:52 UTC

[GitHub] [spark] ornew opened a new pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

ornew opened a new pull request #31774:
URL: https://github.com/apache/spark/pull/31774


   Web UI does not correctly get appId when it has `proxy` or `history` in URL.
   
   In my case, it happens on `https://jupyterhub.hosted.us/my-name/proxy/4040/executors/`.
   Web developer console says: `jquery-3.4.1.min.js:2 GET https://jupyterhub.hosted.us/user/my-name/proxy/4040/api/v1/applications/4040/allexecutors 404`, and it shows blank pages to me.
   
   There is relative issue in jupyterhub https://github.com/jupyterhub/jupyter-server-proxy/issues/57
   
   https://github.com/apache/spark/blob/2526fdea481b1777b2c4a2242254b72b5c49d820/core/src/main/resources/org/apache/spark/ui/static/utils.js#L93-L105
   
   It should not get from document.baseURI.
   A request will occur, but performance impacts will be a bit.
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This always get an appId using the API.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   The UI does not appear correctly in some environments. For example, there is a Jupyterhub.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No, this is bug fix.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   I see it correctly works in my browser.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-1070843799


   @PerilousApricot I will take a close look before the 3.3 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-796462050


   Gentle ping, @ornew .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-1004938591


   Hello @gengliangwang and Happy New years! I'm back from vacation and was wondering if you had further thoughts on this issue. Were you able to reproduce the bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot edited a comment on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot edited a comment on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994128564


   @gengliangwang -- I actually have a very simple reproducer using nginx as a reverse proxy and not jupyterhub (to eliminate that failure mode). The following script will set up the proxy, note that it redirects `/user/PerilousApricot/proxy/4040/` to the root of the spark webUI 
   
   **proxy-fail.sh**
   ```bash
   #!/bin/bash
   
   
   cat << \EOT > nginx.conf
   user  nginx;
   worker_processes  auto;
   error_log  /var/log/nginx/error.log notice;
   pid        /var/run/nginx.pid;
   events {
       worker_connections  1024;
   }
   http {
       include       /etc/nginx/mime.types;
       default_type  application/octet-stream;
       log_format  main  '[$time_local] "$request" '
                         '$status $body_bytes_sent "$http_referer" '
                         '"$http_x_forwarded_for"';
       access_log  /dev/stdout  main;
       server {
           listen       5050;
           server_name  localhost;
           location /user/PerilousApricot/proxy/4040/ {
               error_log  /dev/stderr debug;
               proxy_pass http://localhost:4040/;
               proxy_pass_header Content-Type;
           }        
       }
   }
   
   EOT
   
   docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
   ```
   
   Run that proxy in one terminal, then run pyspark:
   ```
   SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/proxy/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/proxy/4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/proxy/4040/ --conf spark.app.name=proxyApp
   ```
   
   
   Open `http://localhost:5050/user/PerilousApricot/proxy/4040/executors/` in a browser with "developer mode" enabled to watch the traffic come by. You will see a number of successful requests to various resources like:
   
   ```
    http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.css
    http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.js
    ```
   
   Notice, however that there is a failed request (and the reason of this PR) - 
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   
   If you run curl manually on both that URL, you can see that it fails both at the reverse proxy and at the actual webui itself:
   ```
   curl -v -o /dev/null http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   curl -v -o /dev/null http://localhost:4040/api/v1/applications/4040/allexecutors
   ```
   
   But if you copy-paste the appId from the spark console (in my case I have: `Spark context available as 'sc' (master = local[*], app id = local-1639522961946).`), the following two requests succeed:
   ```
   curl http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/local-1639522961946
   curl -v -o /dev/null http://localhost:4040/api/v1/applications/local-1639522961946
   ```
   
   To confirm the issue, let's restart the proxy and pyspark, but instead of proxying `/user/PerilousApricot/proxy/4040/`, let's instead proxy to `/user/PerilousApricot/yxorp/4040/` (note that there is no "proxy" in the proxied URL). First execute
   **proxy-win.sh**
   ```
   #!/bin/bash
   
   
   cat << \EOT > nginx.conf
   user  nginx;
   worker_processes  auto;
   error_log  /var/log/nginx/error.log notice;
   pid        /var/run/nginx.pid;
   events {
       worker_connections  1024;
   }
   http {
       include       /etc/nginx/mime.types;
       default_type  application/octet-stream;
       log_format  main  '[$time_local] "$request" '
                         '$status $body_bytes_sent "$http_referer" '
                         '"$http_x_forwarded_for"';
       access_log  /dev/stdout  main;
       server {
           listen       5050;
           server_name  localhost;
           location /user/PerilousApricot/yxorp/4040/ {
               #error_log  /dev/stderr debug;
               proxy_pass http://localhost:4040/;
               #proxy_redirect     off;
               proxy_pass_header Content-Type;
               #rewrite /user/PerilousApricot/yxorp/4040(/.*|$) $1  break;
           }        
       }
   }
   
   EOT
   
   docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
   ```
   
   and then run in a different terminal
   ```
   SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/yxorp/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/yxorp//4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/yxorp/4040/ --conf spark.app.name=proxyApp
   ```
   
   Open `http://localhost:5050/user/PerilousApricot/yxorp/4040//executors/` and you can see that the page renders properly. Looking at the development console, you will see that instead of attempting to open
   
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   
   this version requests the status of the executors from
   ```
   http://localhost:5050/user/PerilousApricot/yxorp/4040//api/v1/applications/local-1639523380430/allexecutors
   ```
   
   I hope this is enough to show that @ornew did the right analysis -- Th fault isn't with jupyterhub, it is simply the fact that the logic that tries to look up the appId chokes if there is a path element named "proxy" in the URL.
   
   Can you please re-examine this?
   
   EDIT: I tested with spark 3.2.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-792447646


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994128564


   @gengliangwang -- I actually have a very simple reproducer using nginx as a reverse proxy and not jupyterhub (to eliminate that failure mode). The following script will set up the proxy, note that it redirects `/user/PerilousApricot/proxy/4040/` to the root of the spark webUI 
   
   **proxy-fail.sh**
   ```bash
   #!/bin/bash
   
   
   cat << \EOT > nginx.conf
   user  nginx;
   worker_processes  auto;
   error_log  /var/log/nginx/error.log notice;
   pid        /var/run/nginx.pid;
   events {
       worker_connections  1024;
   }
   http {
       include       /etc/nginx/mime.types;
       default_type  application/octet-stream;
       log_format  main  '[$time_local] "$request" '
                         '$status $body_bytes_sent "$http_referer" '
                         '"$http_x_forwarded_for"';
       access_log  /dev/stdout  main;
       server {
           listen       5050;
           server_name  localhost;
           location /user/PerilousApricot/proxy/4040/ {
               error_log  /dev/stderr debug;
               proxy_pass http://localhost:4040/;
               proxy_pass_header Content-Type;
           }        
       }
   }
   
   EOT
   
   docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
   ```
   
   Run that proxy in one terminal, then run pyspark:
   ```
   SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/proxy/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/proxy/4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/proxy/4040/ --conf spark.app.name=proxyApp
   ```
   
   
   Open `http://localhost:5050/user/PerilousApricot/proxy/4040/executors/` in a browser with "developer mode" enabled to watch the traffic come by. You will see a number of successful requests to various resources like:
   
   ```
    http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.css
    http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.js
    ```
   
   Notice, however that there is a failed request (and the reason of this PR) - 
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   
   If you run curl manually on both that URL, you can see that it fails both at the reverse proxy and at the actual webui itself:
   ```
   curl -v -o /dev/null http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   curl -v -o /dev/null http://localhost:4040/api/v1/applications/4040/allexecutors
   ```
   
   But if you copy-paste the appId from the spark console (in my case I have: `Spark context available as 'sc' (master = local[*], app id = local-1639522961946).`), the following two requests succeed:
   ```
   curl http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/local-1639522961946
   curl -v -o /dev/null http://localhost:4040/api/v1/applications/local-1639522961946
   ```
   
   To confirm the issue, let's restart the proxy and pyspark, but instead of proxying `/user/PerilousApricot/proxy/4040/`, let's instead proxy to `/user/PerilousApricot/yxorp/4040/` (note that there is no "proxy" in the proxied URL). First execute
   **proxy-win.sh**
   ```
   #!/bin/bash
   
   
   cat << \EOT > nginx.conf
   user  nginx;
   worker_processes  auto;
   error_log  /var/log/nginx/error.log notice;
   pid        /var/run/nginx.pid;
   events {
       worker_connections  1024;
   }
   http {
       include       /etc/nginx/mime.types;
       default_type  application/octet-stream;
       log_format  main  '[$time_local] "$request" '
                         '$status $body_bytes_sent "$http_referer" '
                         '"$http_x_forwarded_for"';
       access_log  /dev/stdout  main;
       server {
           listen       5050;
           server_name  localhost;
           location /user/PerilousApricot/yxorp/4040/ {
               #error_log  /dev/stderr debug;
               proxy_pass http://localhost:4040/;
               #proxy_redirect     off;
               proxy_pass_header Content-Type;
               #rewrite /user/PerilousApricot/yxorp/4040(/.*|$) $1  break;
           }        
       }
   }
   
   EOT
   
   docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
   ```
   
   and then run in a different terminal
   ```
   SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/yxorp/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/yxorp//4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/yxorp/4040/ --conf spark.app.name=proxyApp
   ```
   
   Open `http://localhost:5050/user/PerilousApricot/yxorp/4040//executors/` and you can see that the page renders properly. Looking at the development console, you will see that instead of attempting to open
   
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   
   this version requests the status of the executors from
   ```
   http://localhost:5050/user/PerilousApricot/yxorp/4040//api/v1/applications/local-1639523380430/allexecutors
   ```
   
   I hope this is enough to show that @ornew did the right analysis -- Th fault isn't with jupyterhub, it is simply the fact that the logic that tries to look up the appId chokes if there is a path element named "proxy" in the URL.
   
   Can you please re-examine this?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-794166593


   @ornew could you show reproduce steps on Spark? It seems that the jupyterhub is using a different Spark UI URL from Spark.
   > Also the code change breaks the logic. Spark can get the app id without access rest API if the URL contains proxy or history.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #31774:
URL: https://github.com/apache/spark/pull/31774


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot edited a comment on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot edited a comment on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994816925


   Hi @gengliangwang this is running in client mode. The use-case is running spark within a jupyter notebook
   
   Thanks for the pointers, but the point of the PR is that there is a bug in how the reverse proxying is handled. If you see the reproducer, I am using the config options mentioned in #13950 and #29820.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-1071891309


   @gengliangwang Thank you very much! This would be a huge relief for our use-case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-802997989


   @ornew I mean, if we can't reproduce the issue on a Spark cluster, then it is not the issue of Spark itself.
   Spark does support running behind a reverse proxy, see https://github.com/apache/spark/pull/29820 for details.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-1071891309


   @gengliangwang Thank you very much! This would be a huge relief for our use-case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-1070843799


   @PerilousApricot I will take a close look before the 3.3 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994820901


   In the current master, when you reverse proxy to
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/
   ```
   then Spark UI tries to do an API call to
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   to retrieve the executor status, but this is incorrect, it should be
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/local-1639523380430/allexecutors
   ```
   (where local-1639523380430 is the appId of the SparkContext).
   
   The problem is that Spark itself bungles the handling of the appId. You said in https://github.com/apache/spark/pull/31774#issuecomment-802997989 that the problem was unreproducible in a Spark cluster, I hope that the reproducer I put in the comment above is enough to show the issue. Please let me know if I can help clarify it better.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-1069717865


   Hello @gengliangwang checking up on this issue. I gave a reproducer above that clearly shows the issue. Have you had a chance to take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #31774:
URL: https://github.com/apache/spark/pull/31774#discussion_r597861796



##########
File path: core/src/main/resources/org/apache/spark/ui/static/utils.js
##########
@@ -90,26 +90,27 @@ function formatLogsCells(execLogs, type) {
 }
 
 function getStandAloneAppId(cb) {
-  var words = document.baseURI.split('/');
-  var ind = words.indexOf("proxy");
-  if (ind > 0) {
-    var appId = words[ind + 1];
-    cb(appId);
-    return;
-  }
-  ind = words.indexOf("history");
-  if (ind > 0) {
-    var appId = words[ind + 1];
-    cb(appId);
-    return;
-  }
   // Looks like Web UI is running in standalone mode
   // Let's get application-id using REST End Point
   $.getJSON(uiRoot + "/api/v1/applications", function(response, status, jqXHR) {
     if (response && response.length > 0) {
       var appId = response[0].id;
       cb(appId);
       return;
+    } else {
+      var words = document.baseURI.split('/');

Review comment:
       BTW, have you tested the code changes for Spark UI behind proxy and Spark UI of History server?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ornew commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
ornew commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-799982981


   @dongjoon-hyun @gengliangwang Thank you for your reply.
   
   > @ornew could you show reproduce steps on Spark? It seems that the jupyterhub is using a different Spark UI URL from Spark.
   
   It's easy. Please run JupyterHub with [jupyter-server-proxy](https://github.com/jupyterhub/jupyter-server-proxy), and PySpark.
   
   I build Jupyter Hub on Kubernetes and provide a sandbox for a large number of users. Accessing the Spark UI requires some kind of proxy. JupyterHub's Server Proxy allows you to access ports in your sandbox environment without user interaction.
   
   ```python
   from pyspark import *
   from pyspark.sql import *
   
   spark = SparkSession.builder.getOrCreate()
   ```
   
   <img width="651" alt="スクリーンショット 2021-03-16 14 58 57" src="https://user-images.githubusercontent.com/19766770/111263381-21b30300-8669-11eb-92b2-0008be233ea7.png">
   
   When accessing the Spark UI by port, the Jupyter Server Proxy path contains `proxy`, which causes incorrect parsing.
   
   <img width="637" alt="スクリーンショット 2021-03-16 15 01 56" src="https://user-images.githubusercontent.com/19766770/111263218-d8fb4a00-8668-11eb-9d63-2a427db49b32.png">
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #31774:
URL: https://github.com/apache/spark/pull/31774


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ornew commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
ornew commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-800000156


   @gengliangwang I think this is a Spark issue. 
   
   This always happens when the path contains `proxy` or `history` if without JupyterHub also. There are many use cases where the path does not include the `appId`, such as standalone or Kubernetes. Also there are many opportunities to access it via a proxy. Would you please reconsider about and the logic of getting appId that depends on the environment and URL?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-799988472


   @ornew Is it possible to fix it in JupyterHub? The issue is not on Spark itself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-793459875


   Thank you for making a PR, @ornew .
   
   cc @gengliangwang 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994561738


   @PerilousApricot  Are you running Spark as a cluster? If yes, Spark supports reverse proxy, see the following PRs for details:
   https://github.com/apache/spark/pull/13950/
   https://github.com/apache/spark/pull/29820
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-869246328


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot commented on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot commented on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994816925


   Hi @gengliangwang yes, I am.
   
   Thanks for the pointers, but the point of the PR is that there is a bug in how the reverse proxying is handled. If you see the reproducer, I am using the config options mentioned in #13950 and #29820.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] PerilousApricot edited a comment on pull request #31774: [SPARK-34659] Fix that Web UI always correctly get appId

Posted by GitBox <gi...@apache.org>.
PerilousApricot edited a comment on pull request #31774:
URL: https://github.com/apache/spark/pull/31774#issuecomment-994128564


   @gengliangwang -- I actually have a very simple reproducer using nginx as a reverse proxy and not jupyterhub (to eliminate that failure mode). The following script will set up the proxy, note that it redirects `/user/PerilousApricot/proxy/4040/` to the root of the spark webUI (the URL is what jupyterhub would use, but obviously, this is a simple reverse-proxy without jupyterhub)
   
   **proxy-fail.sh**
   ```bash
   #!/bin/bash
   
   
   cat << \EOT > nginx.conf
   user  nginx;
   worker_processes  auto;
   error_log  /var/log/nginx/error.log notice;
   pid        /var/run/nginx.pid;
   events {
       worker_connections  1024;
   }
   http {
       include       /etc/nginx/mime.types;
       default_type  application/octet-stream;
       log_format  main  '[$time_local] "$request" '
                         '$status $body_bytes_sent "$http_referer" '
                         '"$http_x_forwarded_for"';
       access_log  /dev/stdout  main;
       server {
           listen       5050;
           server_name  localhost;
           location /user/PerilousApricot/proxy/4040/ {
               error_log  /dev/stderr debug;
               proxy_pass http://localhost:4040/;
               proxy_pass_header Content-Type;
           }        
       }
   }
   
   EOT
   
   docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
   ```
   
   Run that proxy in one terminal, then run pyspark:
   ```
   SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/proxy/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/proxy/4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/proxy/4040/ --conf spark.app.name=proxyApp
   ```
   
   
   Open `http://localhost:5050/user/PerilousApricot/proxy/4040/executors/` in a browser with "developer mode" enabled to watch the traffic come by. You will see a number of successful requests to various resources like:
   
   ```
    http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.css
    http://localhost:5050/user/PerilousApricot/proxy/4040//static/webui.js
    ```
   
   Notice, however that there is a failed request (and the reason of this PR) - 
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   
   If you run curl manually on both that URL, you can see that it fails both at the reverse proxy and at the actual webui itself:
   ```
   curl -v -o /dev/null http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   curl -v -o /dev/null http://localhost:4040/api/v1/applications/4040/allexecutors
   ```
   
   But if you copy-paste the appId from the spark console (in my case I have: `Spark context available as 'sc' (master = local[*], app id = local-1639522961946).`), the following two requests succeed:
   ```
   curl http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/local-1639522961946
   curl -v -o /dev/null http://localhost:4040/api/v1/applications/local-1639522961946
   ```
   
   To confirm the issue, let's restart the proxy and pyspark, but instead of proxying `/user/PerilousApricot/proxy/4040/`, let's instead proxy to `/user/PerilousApricot/yxorp/4040/` (note that there is no "proxy" in the proxied URL). First execute
   **proxy-win.sh**
   ```
   #!/bin/bash
   
   
   cat << \EOT > nginx.conf
   user  nginx;
   worker_processes  auto;
   error_log  /var/log/nginx/error.log notice;
   pid        /var/run/nginx.pid;
   events {
       worker_connections  1024;
   }
   http {
       include       /etc/nginx/mime.types;
       default_type  application/octet-stream;
       log_format  main  '[$time_local] "$request" '
                         '$status $body_bytes_sent "$http_referer" '
                         '"$http_x_forwarded_for"';
       access_log  /dev/stdout  main;
       server {
           listen       5050;
           server_name  localhost;
           location /user/PerilousApricot/yxorp/4040/ {
               #error_log  /dev/stderr debug;
               proxy_pass http://localhost:4040/;
               #proxy_redirect     off;
               proxy_pass_header Content-Type;
               #rewrite /user/PerilousApricot/yxorp/4040(/.*|$) $1  break;
           }        
       }
   }
   
   EOT
   
   docker run -it --rm=true --name spark-31174-proxy --network=host -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro nginx
   ```
   
   and then run in a different terminal
   ```
   SPARK_PUBLIC_DNS=localhost:5050/user/PerilousApricot/yxorp/4040/jobs/ pyspark --conf spark.ui.reverseProxyUrl=http://localhost:5000/user/PerilousApricot/yxorp//4040/ --conf spark.driver.extraJavaOptions="-Dlog4j.debug=true" --conf spark.ui.proxyBase=/user/PerilousApricot/yxorp/4040/ --conf spark.app.name=proxyApp
   ```
   
   Open `http://localhost:5050/user/PerilousApricot/yxorp/4040//executors/` and you can see that the page renders properly. Looking at the development console, you will see that instead of attempting to open
   
   ```
   http://localhost:5050/user/PerilousApricot/proxy/4040/api/v1/applications/4040/allexecutors
   ```
   
   this version requests the status of the executors from
   ```
   http://localhost:5050/user/PerilousApricot/yxorp/4040//api/v1/applications/local-1639523380430/allexecutors
   ```
   
   I hope this is enough to show that @ornew did the right analysis -- Th fault isn't with jupyterhub, it is simply the fact that the logic that tries to look up the appId chokes if there is a path element named "proxy" in the URL.
   
   Can you please re-examine this?
   
   EDIT: I tested with spark 3.2.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org