You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/05/11 15:39:34 UTC

[GitHub] [solr] HoustonPutman commented on pull request #846: Make 'latest' remain in URL instead of `9_0`

HoustonPutman commented on PR #846:
URL: https://github.com/apache/solr/pull/846#issuecomment-1123943089

   > Maybe do it the other way round: If the canonical URL is always "latest" (as described in the documentation), then Google would forget all old versions and only show links to latest. Thats actually a good thing and would solve our problems.
   
   +1, we want `latest` to be in the google results.
   
   I have tested this, and the current logic will set the canonical link to `latest`, not `9_0`. The version selection tool also links to the correct name of the page in each version, [if the page has been renamed](https://docs.antora.org/antora/latest/page/page-aliases/#page-aliases-attribute). So this is exactly the logic that we want.
   
   > * always redirecting to "latest" seems bad to me, as it makes it impossible to add permlinks. Or is there the possibility to get some "permlink" button on each page? So somebody citing a specific page can make a persistent link?
   
   We should certainly add this and it shouldn't be hard to do. I am often annoyed with the AWS docs, trying to link to the specific latest version.
   
   > We should maybe think of patching all old pages with a canonical link, too. Or much better instead of patching, we could add a HTTP "Link:" header (see Google Docs above) to the `.htaccess` where we maybe link all 8.x pages to latest 8.11 refguide on the HTTP level. Same for 7.x and 6.x. This would at least remove all variants from google except the latets version of each major release.
   
   We definitely need to do something about the Solr 6-8 releases. Generally only the pages that don't exist in `latest` (after redirects) should be indexed in Google. I think that will be hard to do in general, though we can go back and do it. In my opinion, we can probably do a blanket `robots` file that says don't index the old ref-guide pages. It will make some information not searchable, but there should be very few pages that have been removed before 9.0
   
   Somewhat sane suggestion: We make a `robots.txt` that disallows scrapping all 6-8 ref guides. Then we create exceptions for the pages that have been removed in a certain version. So if a page (like autoscaling) was removed in 9.0, we create an exception that allows scraping `old-guide/8_11/autoscaling.html`, if a page was removed in 8.5, we allow `old-guide/8_4/page.html`. There shouldn't be too many of these pages, so we can go through and add the exceptions manually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org