You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Tobias Wunderlich (Created) (JIRA)" <ji...@apache.org> on 2011/10/18 15:10:10 UTC

[jira] [Created] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

WikiConnector - option to limit crawl by namespace
--------------------------------------------------

                 Key: CONNECTORS-277
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
             Project: ManifoldCF
          Issue Type: Improvement
          Components: Wiki connector
    Affects Versions: ManifoldCF next
            Reporter: Tobias Wunderlich
            Priority: Minor


At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.

Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129836#comment-13129836 ] 

Karl Wright commented on CONNECTORS-277:
----------------------------------------

Do you know if there is an API-based way to get the names of all the pertinent namespaces?

                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF next
>            Reporter: Tobias Wunderlich
>            Priority: Minor
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132524#comment-13132524 ] 

Karl Wright commented on CONNECTORS-277:
----------------------------------------

r1187220 to correct this latest issue.

                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Tobias Wunderlich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132500#comment-13132500 ] 

Tobias Wunderlich commented on CONNECTORS-277:
----------------------------------------------

Changes to namespace and title are not applied to the job coorectly. Altough the changes are displayed after clicking "save", they don't show up reediting the job.
                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130454#comment-13130454 ] 

Karl Wright commented on CONNECTORS-277:
----------------------------------------

Thank you for the clarification.

                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Karl Wright (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright resolved CONNECTORS-277.
------------------------------------

    Resolution: Fixed

r1187038

                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Tobias Wunderlich (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130432#comment-13130432 ] 

Tobias Wunderlich commented on CONNECTORS-277:
----------------------------------------------

Im sorry, i may have used the wrong word with namespace. What i meant was that we should have the option to crawl teh Wiki by title. I guess that won't be a big problem, since the api-calls u used support "from=<title>" and "to=<title>" params?
                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130060#comment-13130060 ] 

Karl Wright commented on CONNECTORS-277:
----------------------------------------

nvm, I found a URL they claim will return the namespace list:


http://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces 

This seems to return XML of the following form:

<?xml version="1.0"?>
<api>
  <query>
    <namespaces>
      <ns id="-2" case="first-letter" canonical="Media" xml:space="preserve">Media</ns>
      <ns id="-1" case="first-letter" canonical="Special" xml:space="preserve">Special</ns>
      <ns id="0" case="first-letter" subpages="" content="" xml:space="preserve" />
      <ns id="1" case="first-letter" subpages="" canonical="Talk" xml:space="preserve">Talk</ns>
      <ns id="2" case="first-letter" subpages="" canonical="User" xml:space="preserve">User</ns>
      <ns id="3" case="first-letter" subpages="" canonical="User talk" xml:space="preserve">User talk</ns>
      <ns id="4" case="first-letter" subpages="" canonical="Project" xml:space="preserve">Project</ns>
      <ns id="5" case="first-letter" subpages="" canonical="Project talk" xml:space="preserve">Project talk</ns>
      <ns id="6" case="first-letter" canonical="File" xml:space="preserve">File</ns>
      <ns id="7" case="first-letter" subpages="" canonical="File talk" xml:space="preserve">File talk</ns>
      <ns id="8" case="first-letter" canonical="MediaWiki" xml:space="preserve">MediaWiki</ns>
      <ns id="9" case="first-letter" subpages="" canonical="MediaWiki talk" xml:space="preserve">MediaWiki talk</ns>
      <ns id="10" case="first-letter" subpages="" canonical="Template" xml:space="preserve">Template</ns>
      <ns id="11" case="first-letter" subpages="" canonical="Template talk" xml:space="preserve">Template talk</ns>
      <ns id="12" case="first-letter" subpages="" canonical="Help" xml:space="preserve">Help</ns>
      <ns id="13" case="first-letter" subpages="" canonical="Help talk" xml:space="preserve">Help talk</ns>
      <ns id="14" case="first-letter" subpages="" canonical="Category" xml:space="preserve">Category</ns>
      <ns id="15" case="first-letter" canonical="Category talk" xml:space="preserve">Category talk</ns>
      <ns id="90" case="first-letter" canonical="Thread" xml:space="preserve">Thread</ns>
      <ns id="91" case="first-letter" canonical="Thread talk" xml:space="preserve">Thread talk</ns>
      <ns id="92" case="first-letter" canonical="Summary" xml:space="preserve">Summary</ns>
      <ns id="93" case="first-letter" canonical="Summary talk" xml:space="preserve">Summary talk</ns>
      <ns id="100" case="first-letter" subpages="" canonical="Manual" content="" xml:space="preserve">Manual</ns>
      <ns id="101" case="first-letter" subpages="" canonical="Manual talk" xml:space="preserve">Manual talk</ns>
      <ns id="102" case="first-letter" subpages="" canonical="Extension" content="" xml:space="preserve">Extension</ns>
      <ns id="103" case="first-letter" subpages="" canonical="Extension talk" xml:space="preserve">Extension talk</ns>
      <ns id="104" case="first-letter" subpages="" canonical="API" xml:space="preserve">API</ns>
      <ns id="105" case="first-letter" subpages="" canonical="API talk" xml:space="preserve">API talk</ns>
    </namespaces>
  </query>
</api>


                
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CONNECTORS-277) WikiConnector - option to limit crawl by namespace

Posted by "Karl Wright (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CONNECTORS-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright updated CONNECTORS-277:
-----------------------------------

    Affects Version/s:     (was: ManifoldCF next)
                       ManifoldCF 0.4
        Fix Version/s: ManifoldCF 0.4
             Assignee: Karl Wright
    
> WikiConnector - option to limit crawl by namespace
> --------------------------------------------------
>
>                 Key: CONNECTORS-277
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-277
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Wiki connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Tobias Wunderlich
>            Assignee: Karl Wright
>            Priority: Minor
>             Fix For: ManifoldCF 0.4
>
>
> At the moment, the WikiConnector crawls the whole Wiki. This can take up a lot of time. For testing purposes an option to limit the pages to crawl by namespaces(title) would be great.
> Tobias

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira