You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/10/25 09:48:26 UTC

[GitHub] [pulsar] visortelle opened a new issue, #18190: [Bug] It's hard to find latest docs using search engines like Google

visortelle opened a new issue, #18190:
URL: https://github.com/apache/pulsar/issues/18190

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar.
   
   
   ### Version
   
   -
   
   ### Minimal reproduce step
   
   <img width="803" alt="Screen Shot 2022-10-25 at 11 39 54 AM" src="https://user-images.githubusercontent.com/9302460/197740073-41066585-fca0-410a-82cb-76809d39d3d0.png">
   
   <img width="1180" alt="Screen Shot 2022-10-25 at 11 40 17 AM" src="https://user-images.githubusercontent.com/9302460/197740131-8f50886b-ac0c-43c2-9a11-1d26ec48cc7f.png">
   
   
   ### What did you expect to see?
   
   Documentation for the lastest version of Pulsar.
   
   ### What did you see instead?
   
   I constantly see the documentation for random old versions of Pulsar.
   
   ### Anything else?
   
   It should be possible to fix it by adding the following HTML meta tag to all pages for old Pulsar versions docs.
   
   ```html
   <head>
     <meta name="robots" content="noindex, nofollow" />
   </head>
   ```
   
   If you are using Docusaurus, it should be doable by conditionally adding required meta tag to its config file:
   
   https://docusaurus.io/docs/seo
   
   After a couple of weeks, Google should reindex these pages.
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "tisonkun (via GitHub)" <gi...@apache.org>.
tisonkun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1504707158

   After a recheck - "client library pulsar" and "pulsar java admin client" now return correct 2.11.x version, while "pulsar subscription type" still return 2.3.2 as featured result. Since we explicitly mark that page to be "noIndex", I'm guessing featured results take longer time to downgrade.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
tisonkun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290312950

   I'm glad to help with reviewing and verifying once you submit one :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] visortelle commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
visortelle commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290309117

   But issues are disabled here. You probably meant to open a PR. :)
   
   <img width="735" alt="Screen Shot 2022-10-25 at 12 10 14 PM" src="https://user-images.githubusercontent.com/9302460/197746417-f0fb6e75-0fbc-416e-91fd-a2bbb4f80739.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "momo-jun (via GitHub)" <gi...@apache.org>.
momo-jun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477213289

   I've tried to insert the following tags into a specific .md file.
   
   ```
   <head>
       <link rel="canonical" href="https://pulsar.apache.org/docs/concepts-clients/" />
   </head>
   ```
   
   And the result shows as expected (I guess it overrides the default setting):
   
   <img width="1818" alt="image" src="https://user-images.githubusercontent.com/60642177/226507921-c44500d4-b4d0-4938-b6de-ea303f5b4afe.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] urfreespace commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "urfreespace (via GitHub)" <gi...@apache.org>.
urfreespace commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477613803

   > To summarize the possible solutions based on my understanding:
   > 
   > 1. **Preferred**: Update the [logic of the existing global canonical URL implementation](https://github.com/apache/pulsar/issues/18190#issuecomment-1477202689) to use the latest version of doc page `https://pulsar.apache.org/docs/{topic-id}/` as canonical URL. ---- I've no idea about HOW and need expert to help with it.
   > 2. **Workaround**: Add [HTML tags](https://github.com/apache/pulsar/issues/18190#issuecomment-1477213289) to each doc file of all versions. ---- If we can have the preferred solution in place, this one can be used only when we have minority cases to change the default canonical URLs by overriding the global setting.
   
   @momo-jun I did some research about it, no global configuration to config the meta `canonical` for all pages, so we need to modify all md pages to add them manually, but we could implement a script to help to batch modify them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] jak78 commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "jak78 (via GitHub)" <gi...@apache.org>.
jak78 commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1486912026

   Google search results look much better now. Good job 👏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "tisonkun (via GitHub)" <gi...@apache.org>.
tisonkun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477653824

   Here is a patch https://github.com/apache/pulsar-site/pull/481


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] ishu-thakur commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
ishu-thakur commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290298988

   sure @visortelle we can wait for their response and after that I can start working on it :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] visortelle commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
visortelle commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290354646

   @ishu-thakur I think you should somehow conditionally configure the Docusaurus config instead of adding the meta tag in every html file.
   
   If you take it, it's your task to find out how to do it in a better way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
tisonkun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290312393

   @visortelle yes. Sorry to make the typo :P


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1328432937

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "tisonkun (via GitHub)" <gi...@apache.org>.
tisonkun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477624092

   @urfreespace perhaps you can eject something like `DocHead` by `yarn swizzle`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "momo-jun (via GitHub)" <gi...@apache.org>.
momo-jun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477202689

   It looks like we do have a global setting regarding canonical URLs (see screenshot below), and it always returns the current version of the doc we are browsing, which just doesn't make sense. 
   Not sure if this is the global setting that we can improve to resolve the issue?
   
   <img width="1321" alt="image" src="https://user-images.githubusercontent.com/60642177/226505976-67d60853-c892-4315-a093-d698f07dc707.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] ishu-thakur commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
ishu-thakur commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290351302

   @visortelle should I fork the repo and start adding the html tag in every single .html file ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
tisonkun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290306011

   Cool! I think you can open an issue against https://github.com/apache/pulsar-site under `site2/website-next` where we place the source code of the website.
   
   cc @urfreespace @michaeljmarshall @Anonymitaet @momo-jun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] ishu-thakur commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
ishu-thakur commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290294511

   Hi @visortelle I would like to contribute in this issue , by raising the PR after fixing the meta tag in html .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] visortelle commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
visortelle commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1290297898

   Hi @ishu-thakur.
   
   If docs maintainers don't mind, then do it. 
   Let's wait for some response from them first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] visortelle commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
visortelle commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1295281487

   I just took a look at how it works in Confluent docs.
   
   It seems like they don't use the tags I mentioned above, but use the canonical tag instead.
   
   For example:
   
   This page for v7.2.0 is a bit outdated version of their product) `https://docs.confluent.io/platform/7.2.0/installation/docker/operations/index.html`
   
   has the following tag in the HTML head:
   
   `<link rel="canonical" href="https://docs.confluent.io/platform/current/installation/docker/operations/index.htm">` 
   
   For all versions, the canonical URL is the same for the same .md document, and Google shows them in its search results.
   
   This approach allows the avoidance of conditional HTML tag insertion that depends on the latest version of the product.
   
   I'm not an SEO expert, and probably it would be a good idea to take a small consultation with some professionals in this area before doing something.
   
   ---
   
   
   - More info on canonical tags: https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls
   
   - Relevant Docusaurus [docs](https://github.com/facebook/docusaurus/issues/2603) on how to insert tags to the specific .md document `<head></head>`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] urfreespace commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "urfreespace (via GitHub)" <gi...@apache.org>.
urfreespace commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477671765

   > Here is a patch [apache/pulsar-site#481](https://github.com/apache/pulsar-site/pull/481)
   
   so nice, and very cool, then I think my [PR](https://github.com/apache/pulsar-site/pull/482) could be closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] michaeljmarshall commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1295136198

   This is an important issue to solve. In addition to discoverability, we often expose old docs. For example, when I search `pulsar java admin client`, the second result is for incubator docs.
   
   ![Screen Shot 2022-10-28 at 10 25 31 AM](https://user-images.githubusercontent.com/47911938/198674697-5f8643f0-0076-4cca-a8c0-6f6166c9ab98.png)
   
   Note that the first link is for the java client, not the java admin client. If you dig into the java client page, it points to the admin client. However, I have heard from users that it was hard to find the admin client documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] tisonkun closed issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "tisonkun (via GitHub)" <gi...@apache.org>.
tisonkun closed issue #18190: [Bug] It's hard to find latest docs using search engines like Google
URL: https://github.com/apache/pulsar/issues/18190


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "momo-jun (via GitHub)" <gi...@apache.org>.
momo-jun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1477240695

   To summarize the possible solutions based on my understanding:
   1. **Preferred**: Update the [logic of the existing canonical URL implementation](https://github.com/apache/pulsar/issues/18190#issuecomment-1477202689) to use the latest version of doc page `https://pulsar.apache.org/docs/{topic-id}/`. ---- I've no idea about HOW and need expert to help with it.
   2. **Workaround**: Add [HTML tags](https://github.com/apache/pulsar/issues/18190#issuecomment-1477213289) to each doc file of all versions. ---- If we have the preferred solution in place, this one can be used only when we have minority cases to change the default canonical URLs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on issue #18190: [Bug] It's hard to find latest docs using search engines like Google

Posted by "momo-jun (via GitHub)" <gi...@apache.org>.
momo-jun commented on issue #18190:
URL: https://github.com/apache/pulsar/issues/18190#issuecomment-1469269560

   I encountered the same issue, and before reporting it here, I found it was filed in this thread. 
   <img width="768" alt="image" src="https://user-images.githubusercontent.com/60642177/225199843-b2dcd2a3-b375-4361-bf19-ccbd2490e814.png">
   
   Not sure if there is a possible way to configure the canonical URLs globally across the site? 
   @urfreespace do you have any idea? I think it would be better than configuring the front matter of each markdown file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org