You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Paul Querna (JIRA)" <ji...@apache.org> on 2009/12/10 19:20:19 UTC
[jira] Created: (INFRA-2372) Produce sitemaps for services on
Brutus
Produce sitemaps for services on Brutus
---------------------------------------
Key: INFRA-2372
URL: https://issues.apache.org/jira/browse/INFRA-2372
Project: Infrastructure
Issue Type: Improvement
Security Level: public (Regular issues)
Components: Bugzilla, Confluence, JIRA
Reporter: Paul Querna
We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Henri Yandell updated INFRA-2372:
---------------------------------
Component/s: Bugzilla
Adding BZ back on. I think this was lost in the security migration.
I have copies of the JIRA sitemap code, but not the bugzilla sitemap code.
Looking on thor, the BZ sitemaps are there but haven't been updated since Apr 18. The JIRA sitemaps don't seem to have made it there.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Mark Thomas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788845#action_12788845 ]
Mark Thomas commented on INFRA-2372:
------------------------------------
BZ already has sitemaps configured for SA and the main instance.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794393#action_12794393 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
First attempt done for the JIRAs.
On Brutus:
/home/jira/sitemap/sitemaps
Assuming they look good, going live would be:
a) Switching that directory so it's being generated to http://issues.apache.org/sitemaps/ from a crontab (i.e. to the bugzilla content dir)
b) Adding "Sitemap: http://issues.apache.org/sitemaps/jira_sitemap_index.xml" to the robots.txt
That assumes that we're happy having two indexes in robots.txt.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Jeff Turner (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Turner resolved INFRA-2372.
--------------------------------
Resolution: Fixed
Fixed, r778959
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Confluence, JIRA
> Reporter: Paul Querna
> Assignee: Jeff Turner
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794792#action_12794792 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
JIRA sitemapping now hooked in to the bugzilla sitemaps.
Google webmaster tools validates the changes - submitted URLs goes up from 38k to 187k. Still pending the indexed numbers.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888186#action_12888186 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
Code for JIRA copied onto thor (jira user). Needs work as the psql cmd currently barfs. Fun Solaris I assume :)
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795018#action_12795018 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
Modified robots.txt
/jira/secure/attachment -> /jira/secure
stops Google looking at attachment pages and search result pages. Seems non-valuable for a user.
Also moved to disallowing roller/ and cayenne/ as those jiras no longer exist. Added a disallow on: /jira/browse/*?page= to stop dupes there.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Mark Thomas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Thomas updated INFRA-2372:
-------------------------------
Component/s: (was: Bugzilla)
Removing BZ since it already has this.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Jeff Turner (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Turner reassigned INFRA-2372:
----------------------------------
Assignee: Jeff Turner
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Confluence, JIRA
> Reporter: Paul Querna
> Assignee: Jeff Turner
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795260#action_12795260 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
Total: 187,812, Indexed: 131,583
sitemaps/sitemap_activemq_1.xml.gz OK Sitemap Dec 30, 2009 8,252 7,857
sitemaps/sitemap_bugs.xml.gz OK Sitemap Dec 29, 2009 31,955 12,347
sitemaps/sitemap_jira_1.xml.gz OK Sitemap Dec 29, 2009 50,000 32,110
sitemaps/sitemap_jira_2.xml.gz OK Sitemap Dec 29, 2009 50,000 39,346
sitemaps/sitemap_jira_3.xml.gz OK Sitemap Dec 29, 2009 33,634 30,583
sitemaps/sitemap_sabugs.xml.gz OK Sitemap Dec 29, 2009 6,266 3,624
sitemaps/sitemap_struts_1.xml.gz OK Sitemap Dec 29, 2009 7,705 5,716
Mark had suggested that he felt the non-indexed items were the older ones - I think this makes sense. If you view the main JIRA sitemaps, the %age covered goes up as they get newer.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Mark Thomas (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Thomas updated INFRA-2372:
-------------------------------
Component/s: (was: Bugzilla)
BZ has been fixed for a while now.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794793#action_12794793 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
Is suspect we need more Disallow's in robots.txt. For example IssueNavigator and ConfigureReport.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847832#action_12847832 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
Note, we're about to start our 4th jira sitemap file. Need to double check at some point that that rollover works happily.
I also just removed the struts sitemap file, given its jira is now gone.
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (INFRA-2372) Produce sitemaps for services on
Brutus
Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/INFRA-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793547#action_12793547 ]
Henri Yandell commented on INFRA-2372:
--------------------------------------
For JIRA it's basically:
select pkey, updated from jiraissue;
Need to add the text to output the timestamp correctly. Also to build the xml, and then split the large file into multiple 10M files. Could use the updated column as an optimization to avoid rebuilding things everytime. First time sort by updated, then updates could be handled by a where clause.
Feels like there should be a tool to do this - plug in the initial SQL and the SQL with where clause and away it goes. Anyone know of such a thing?
> Produce sitemaps for services on Brutus
> ---------------------------------------
>
> Key: INFRA-2372
> URL: https://issues.apache.org/jira/browse/INFRA-2372
> Project: Infrastructure
> Issue Type: Improvement
> Security Level: public(Regular issues)
> Components: Bugzilla, Confluence, JIRA
> Reporter: Paul Querna
>
> We are currently seeing a massive draw of bandwidth to brutus.apache.org, almost entirely from Googlebot / MSNbot.
> To resolve this without blocking the robots, we should produce XML sitemaps for Bugzilla, Confluence, and Jira.
> For BZ/Jira, it should be as simple as putting every issue into the sitemap, with a last modified time of the last comment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.