You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Tobias Wunderlich (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 15:10:10 UTC
[jira] [Created] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
WikiConnector - option to limit crawl by namespace
--------------------------------------------------
Key: CONNECTORS-277
URL: https://issues.apache.org/jira/browse/CONNECTORS-277
Project: ManifoldCF
Issue Type: Improvement
Components: Wiki connector
Affects Versions: ManifoldCF next
Reporter: Tobias Wunderlich
Priority: Minor
At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129836#comment-13129836 ]
Karl Wright commented on CONNECTORS-277:
----------------------------------------
Do you know if there is an API-based way to get the names of all the pertinent namespaces?
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF next
> Reporter: Tobias Wunderlich
> Priority: Minor
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132524#comment-13132524 ]
Karl Wright commented on CONNECTORS-277:
----------------------------------------
r1187220 to correct this latest issue.
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Tobias Wunderlich (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132500#comment-13132500 ]
Tobias Wunderlich commented on CONNECTORS-277:
----------------------------------------------
Changes to namespace and title are not applied to the job coorectly. Altough the changes are displayed after clicking "save", they don't show up reediting the job.
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130454#comment-13130454 ]
Karl Wright commented on CONNECTORS-277:
----------------------------------------
Thank you for the clarification.
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Karl Wright (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-277.
------------------------------------
Resolution: Fixed
r1187038
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Tobias Wunderlich (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130432#comment-13130432 ]
Tobias Wunderlich commented on CONNECTORS-277:
----------------------------------------------
Im sorry, i may have used the wrong word with namespace. What i meant was that we should have the option to crawl teh Wiki by title. I guess that won't be a big problem, since the api-calls u used support "from=<title>" and "to=<title>" params?
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130060#comment-13130060 ]
Karl Wright commented on CONNECTORS-277:
----------------------------------------
nvm, I found a URL they claim will return the namespace list:
http://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces
This seems to return XML of the following form:
<?xml version="1.0"?>
<api>
<query>
<namespaces>
<ns id="-2" case="first-letter" canonical="Media" xml:space="preserve">Media</ns>
<ns id="-1" case="first-letter" canonical="Special" xml:space="preserve">Special</ns>
<ns id="0" case="first-letter" subpages="" content="" xml:space="preserve" />
<ns id="1" case="first-letter" subpages="" canonical="Talk" xml:space="preserve">Talk</ns>
<ns id="2" case="first-letter" subpages="" canonical="User" xml:space="preserve">User</ns>
<ns id="3" case="first-letter" subpages="" canonical="User talk" xml:space="preserve">User talk</ns>
<ns id="4" case="first-letter" subpages="" canonical="Project" xml:space="preserve">Project</ns>
<ns id="5" case="first-letter" subpages="" canonical="Project talk" xml:space="preserve">Project talk</ns>
<ns id="6" case="first-letter" canonical="File" xml:space="preserve">File</ns>
<ns id="7" case="first-letter" subpages="" canonical="File talk" xml:space="preserve">File talk</ns>
<ns id="8" case="first-letter" canonical="MediaWiki" xml:space="preserve">MediaWiki</ns>
<ns id="9" case="first-letter" subpages="" canonical="MediaWiki talk" xml:space="preserve">MediaWiki talk</ns>
<ns id="10" case="first-letter" subpages="" canonical="Template" xml:space="preserve">Template</ns>
<ns id="11" case="first-letter" subpages="" canonical="Template talk" xml:space="preserve">Template talk</ns>
<ns id="12" case="first-letter" subpages="" canonical="Help" xml:space="preserve">Help</ns>
<ns id="13" case="first-letter" subpages="" canonical="Help talk" xml:space="preserve">Help talk</ns>
<ns id="14" case="first-letter" subpages="" canonical="Category" xml:space="preserve">Category</ns>
<ns id="15" case="first-letter" canonical="Category talk" xml:space="preserve">Category talk</ns>
<ns id="90" case="first-letter" canonical="Thread" xml:space="preserve">Thread</ns>
<ns id="91" case="first-letter" canonical="Thread talk" xml:space="preserve">Thread talk</ns>
<ns id="92" case="first-letter" canonical="Summary" xml:space="preserve">Summary</ns>
<ns id="93" case="first-letter" canonical="Summary talk" xml:space="preserve">Summary talk</ns>
<ns id="100" case="first-letter" subpages="" canonical="Manual" content="" xml:space="preserve">Manual</ns>
<ns id="101" case="first-letter" subpages="" canonical="Manual talk" xml:space="preserve">Manual talk</ns>
<ns id="102" case="first-letter" subpages="" canonical="Extension" content="" xml:space="preserve">Extension</ns>
<ns id="103" case="first-letter" subpages="" canonical="Extension talk" xml:space="preserve">Extension talk</ns>
<ns id="104" case="first-letter" subpages="" canonical="API" xml:space="preserve">API</ns>
<ns id="105" case="first-letter" subpages="" canonical="API talk" xml:space="preserve">API talk</ns>
</namespaces>
</query>
</api>
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-277) WikiConnector - option to limit
crawl by namespace
Posted by "Karl Wright (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright updated CONNECTORS-277:
-----------------------------------
Affects Version/s: (was: ManifoldCF next)
ManifoldCF 0.4
Fix Version/s: ManifoldCF 0.4
Assignee: Karl Wright
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
> Key: CONNECTORS-277
> URL: https://issues.apache.org/jira/browse/CONNECTORS-277
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Wiki connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Tobias Wunderlich
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira