You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ondrej Kokes (Jira)" <ji...@apache.org> on 2021/03/03 09:23:00 UTC

[jira] [Updated] (SPARK-34606) New PySpark documentation has different URLs

     [ https://issues.apache.org/jira/browse/SPARK-34606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ondrej Kokes updated SPARK-34606:
---------------------------------
    Description: 
The new documentation site moved some subsites to different URLs, notably the PySpark API reference ([see here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). (Note the new `/reference/` bit in the new URL.)

It's the first hit when you google "pyspark sql functions", you'll also get there if you search for individual functions or modules (e.g. "pyspark streaming").

I looked through various JIRA tickets and pull requests, but couldn't find a mention of this. Even the pull request introducing the new documentation site mentions the only visible change to users is the design, not its location.

Possible resolution:
* let the links be refreshed by search engines and live with dead links in various places (stack overflow, emails, bookmarks, ...)
* identify the missing pages and provide a 301 redirects for these (could be found in logs, google analytics, or maybe we can list all assets generated before/now and diff them)
* change sphinx configuration to result in identical links as before

Links to potentially relevant tickets and PRs:
* https://issues.apache.org/jira/browse/SPARK-31851
* https://github.com/apache/spark/pull/29188
* https://issues.apache.org/jira/browse/SPARK-32188

  was:
The new documentation site moved some subsites to different URLs, notably the PySpark API reference ([see here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). (Note the new `/reference/` bit in the new URL.)

It's the first hit when you google "pyspark sql functions", you'll also get there if you search for individual functions or modules (e.g. "pyspark streaming").

I looked through various JIRA tickets and pull requests, but couldn't find a mention of this. Even the pull request introducing the new documentation site mentions the only visible change to users is the design, not its location.

Possible resolution:
* let the links be refreshed by search engines and live with dead links in various places (stack overflow, emails, bookmarks, ...)
* identify the missing pages and provide a 301 redirects for these (could be found in logs, google analytics, or maybe we can list all assets generated before/now and diff them)
* change sphinx configuration to result in identical links as before

* https://issues.apache.org/jira/browse/SPARK-31851
* https://github.com/apache/spark/pull/29188
* https://issues.apache.org/jira/browse/SPARK-32188


> New PySpark documentation has different URLs
> --------------------------------------------
>
>                 Key: SPARK-34606
>                 URL: https://issues.apache.org/jira/browse/SPARK-34606
>             Project: Spark
>          Issue Type: Bug
>          Components: Documentation, PySpark
>    Affects Versions: 3.1.1
>            Reporter: Ondrej Kokes
>            Priority: Minor
>
> The new documentation site moved some subsites to different URLs, notably the PySpark API reference ([see here|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html]). (Note the new `/reference/` bit in the new URL.)
> It's the first hit when you google "pyspark sql functions", you'll also get there if you search for individual functions or modules (e.g. "pyspark streaming").
> I looked through various JIRA tickets and pull requests, but couldn't find a mention of this. Even the pull request introducing the new documentation site mentions the only visible change to users is the design, not its location.
> Possible resolution:
> * let the links be refreshed by search engines and live with dead links in various places (stack overflow, emails, bookmarks, ...)
> * identify the missing pages and provide a 301 redirects for these (could be found in logs, google analytics, or maybe we can list all assets generated before/now and diff them)
> * change sphinx configuration to result in identical links as before
> Links to potentially relevant tickets and PRs:
> * https://issues.apache.org/jira/browse/SPARK-31851
> * https://github.com/apache/spark/pull/29188
> * https://issues.apache.org/jira/browse/SPARK-32188



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org