You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/07 08:31:21 UTC

[GitHub] [spark] kokes opened a new pull request #31770: [SPARK-34606][DOCS] redirects for moved PySpark docs

kokes opened a new pull request #31770:
URL: https://github.com/apache/spark/pull/31770


   As noted in JIRA, a few links have broken since the latest migration of docs to a new format. This PR aims to fix that.
   
   Some assorted notes:
   - I don't know if I can run logic in the docs' `conf.py`, doesn't seem to be the right thing to do, but I didn't have any executable Python along the way to do this (and I didn't want a new Makefile target)
   - I could have just placed the five simple HTML files in the `redirects` directory and called it a day?
   - I could have used a Sphinx redirects library - I found two, but one was not maintained and the other didn't seem worth the hassle for five pages - or can we expect this trend of moving stuff around to continue?
   - The redirect timeout is set for five seconds, so that a message is displayed. I saw some guidance that recommended zero seconds... I'd rather show a message, but you might be of a different opinion.
   - I only redirected the API pages, I didn't redirect those module code pages (see the PR's comments) as I found them less user facing... was that the right call?
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, old links (indexed by Google etc.) will start working again.
   
   ### How was this patch tested?
   Built the docs locally and tried accessing formerly dead links.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792678188


   **[Test build #135866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135866/testReport)** for PR 31770 at commit [`51dd4d9`](https://github.com/apache/spark/commit/51dd4d97243f3f67860dc63a4348576c8afc7cb5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-841971957


   Okay, I took a deeper look. To completely fix the issue, I think we should have a custom 404 page but that seems like requiring some changes on the server side, and apparently Apache Flink also faces the same issue: (https://issues.apache.org/jira/browse/INFRA-19845, https://issues.apache.org/jira/browse/FLINK-12650).
   
   Let's drop this fix for now, and I will track the upstream changes there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792254083


   **[Test build #135841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135841/testReport)** for PR 31770 at commit [`a590067`](https://github.com/apache/spark/commit/a5900672bb2911a6a981a80474a67d357f1d105d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-905957693


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792266908


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40423/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #31770:
URL: https://github.com/apache/spark/pull/31770


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792257555


   **[Test build #135841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135841/testReport)** for PR 31770 at commit [`a590067`](https://github.com/apache/spark/commit/a5900672bb2911a6a981a80474a67d357f1d105d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792662958


   **[Test build #135866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135866/testReport)** for PR 31770 at commit [`51dd4d9`](https://github.com/apache/spark/commit/51dd4d97243f3f67860dc63a4348576c8afc7cb5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31770: [SPARK-34606][DOCS] redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792241110


   Thanks @kokes. Will take a look tomorrow (KST)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792266908


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40423/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792681546


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135866/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kokes commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
kokes commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792625344


   Given that we should redirect anchor links as well (thanks, @HyukjinKwon), we'll need to do this a bit differently. We need to check if there's an anchor value in the current URL and if so, change both the <meta> redirect and the fallback link in the HTML body itself.
   
   The implementation will then behave like so:
   - pyspark.*.html will redirect to new section homepages
   - pyspark*.html#some_function will redirect to the new page of api/reference/some_function.html
   - if the user doesn't have javascript (incl. some bots), pyspark*.html#some_function will redirect to the new section homepage
   - if the user doesn't have redirects enabled (rare), they can click the link, which contains the same URL
   
   I tested this locally (`python3 -m http.server` in build/html) and it works - both for automatic redirects and clicking the links
   
   - http://localhost:8000/pyspark.sql.html?highlight=from_json#pyspark.sql.functions.from_json
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.DataFrameStatFunctions.crosstab
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.exception
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.id
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer
   
   **BUT**, I found some modules where the anchor links didn't result in new HTML pages - why do some methods have their own pages and some don't?
   
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer.inputCols
   - http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils (here the class doesn't have its own doc page, but its methods do...)
   - http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.InitialPositionInStream
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792254083


   **[Test build #135841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135841/testReport)** for PR 31770 at commit [`a590067`](https://github.com/apache/spark/commit/a5900672bb2911a6a981a80474a67d357f1d105d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31770: [SPARK-34606][DOCS] redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792245915


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792662958


   **[Test build #135866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135866/testReport)** for PR 31770 at commit [`51dd4d9`](https://github.com/apache/spark/commit/51dd4d97243f3f67860dc63a4348576c8afc7cb5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792714248


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40449/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792681546


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135866/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-841971957


   Okay, I took a deeper look. To completely fix the issue, I think we should have a custom 404 page but that seems like requiring some changes on the server side, and apparently Apache Flink also faces the same issue: (https://issues.apache.org/jira/browse/INFRA-19845, https://issues.apache.org/jira/browse/FLINK-12650).
   
   Let's drop this fix for now, and I will track the upstream changes there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31770: [SPARK-34606][DOCS] redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792242647


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792242647






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kokes commented on a change in pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
kokes commented on a change in pull request #31770:
URL: https://github.com/apache/spark/pull/31770#discussion_r589275886



##########
File path: python/docs/source/conf.py
##########
@@ -177,7 +177,24 @@
 # Add any extra paths that contain custom files (such as robots.txt or
 # .htaccess) here, relative to this directory. These files are copied
 # directly to the root of the documentation.
-#html_extra_path = []
+redirects_dir = 'redirects'
+html_extra_path = [redirects_dir]
+
+os.makedirs(redirects_dir, exist_ok=True)
+for moved_page in ['', '.ml', '.mllib', '.sql', '.streaming']:
+    moved_file = f'pyspark{moved_page}.html'

Review comment:
       Oh, I didn't know we'd want to redirect anchor links as well.
   
   In that case we'll have to dive into JavaScript, because the `<meta>` redirect cannot touch anchor values and convert them into paths (and the linked extension cannot do so either, as far as I can tell).
   
   I'll submit a patch shortly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31770:
URL: https://github.com/apache/spark/pull/31770#discussion_r589113277



##########
File path: python/docs/source/conf.py
##########
@@ -177,7 +177,24 @@
 # Add any extra paths that contain custom files (such as robots.txt or
 # .htaccess) here, relative to this directory. These files are copied
 # directly to the root of the documentation.
-#html_extra_path = []
+redirects_dir = 'redirects'
+html_extra_path = [redirects_dir]
+
+os.makedirs(redirects_dir, exist_ok=True)
+for moved_page in ['', '.ml', '.mllib', '.sql', '.streaming']:
+    moved_file = f'pyspark{moved_page}.html'

Review comment:
       I think simply mapping it wouldn't work. For example,
   
   new URL:
   
   ```
   reference/api/pyspark.sql.functions.from_json.html?highlight=from_json#pyspark.sql.functions.from_json
   ```
   
   old URL:
   
   ```
   pyspark.sql.html?highlight=from_json#pyspark.sql.functions.from_json
   ```
   
   Can we leverage other extensions such as https://gitlab.com/documatt/sphinx-reredirects & https://pypi.org/project/sphinx-reredirects/, and just go back to the root page by wildcards?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kokes edited a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
kokes edited a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792625344


   Given that we should redirect anchor links as well (thanks, @HyukjinKwon), we'll need to do this a bit differently. We need to check if there's an anchor value in the current URL and if so, change both the <meta> redirect and the fallback link in the HTML body itself.
   
   The implementation will then behave like so:
   - pyspark.*.html will redirect to new section homepages
   - pyspark*.html#some_function will redirect to the new page of api/reference/some_function.html
   - if the user doesn't have javascript (incl. some bots), pyspark*.html#some_function will redirect to the new section homepage
   - if the user doesn't have redirects enabled (rare), they can click the link, which contains the same URL
   
   I tested this locally (`python3 -m http.server` in build/html) and it works - both for automatic redirects and clicking the links
   
   - http://localhost:8000/pyspark.sql.html?highlight=from_json#pyspark.sql.functions.from_json
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.DataFrameStatFunctions.crosstab
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.exception
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.id
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer
   
   **BUT**, I found some modules where the anchor links didn't result in new HTML pages - why do some methods have their own pages and some don't?
   
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer.inputCols
   - http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils (here the class doesn't have its own doc page, but its methods do...)
   - http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.InitialPositionInStream
   
   Last but not least: I don't sanitise the anchor value in any way and use it as it is - I can't think of any injection that could happen there since it's a relative link to a reference page, but feel free to suggest some regexp check that the hash contains only [a-zA-Z_-0-9\.] or something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792258778


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135841/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792714248


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40449/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org