You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/08 09:44:05 UTC

[GitHub] [spark] kokes edited a comment on pull request #31770: [SPARK-34606][DOCS] Redirects for moved PySpark docs

kokes edited a comment on pull request #31770:
URL: https://github.com/apache/spark/pull/31770#issuecomment-792625344


   Given that we should redirect anchor links as well (thanks, @HyukjinKwon), we'll need to do this a bit differently. We need to check if there's an anchor value in the current URL and if so, change both the <meta> redirect and the fallback link in the HTML body itself.
   
   The implementation will then behave like so:
   - pyspark.*.html will redirect to new section homepages
   - pyspark*.html#some_function will redirect to the new page of api/reference/some_function.html
   - if the user doesn't have javascript (incl. some bots), pyspark*.html#some_function will redirect to the new section homepage
   - if the user doesn't have redirects enabled (rare), they can click the link, which contains the same URL
   
   I tested this locally (`python3 -m http.server` in build/html) and it works - both for automatic redirects and clicking the links
   
   - http://localhost:8000/pyspark.sql.html?highlight=from_json#pyspark.sql.functions.from_json
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.DataFrameStatFunctions.crosstab
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.exception
   - http://localhost:8000/pyspark.sql.html#pyspark.sql.streaming.StreamingQuery.id
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer
   
   **BUT**, I found some modules where the anchor links didn't result in new HTML pages - why do some methods have their own pages and some don't?
   
   - http://localhost:8000/pyspark.ml.html#pyspark.ml.feature.Binarizer.inputCols
   - http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils (here the class doesn't have its own doc page, but its methods do...)
   - http://localhost:8000/pyspark.streaming.html#pyspark.streaming.kinesis.InitialPositionInStream
   
   Last but not least: I don't sanitise the anchor value in any way and use it as it is - I can't think of any injection that could happen there since it's a relative link to a reference page, but feel free to suggest some regexp check that the hash contains only [a-zA-Z_-0-9\.] or something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org