You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2011/10/11 10:22:15 UTC

Wiki connector limping along now

Tobias, et al: I have the first version of a wiki connector limping
along in branches/CONNECTORS-256.  For the people out there who have
wanted to crawl wikis this is a chance to get involved in the
connector development.  Just check out the branch and type "ant
build-dev" and you should be able to run the Quick Start against a
wiki of your choice.

Some caveats -

(1) Since this is meant to be used against only one site, there is no
native throttling in the connector itself.  I strongly suggest,
therefore, setting a maximum number of connections to 2, and adding a
throttle limiting average number of fetches per minute to 20 to the
wiki connection definition.  Otherwise we'll make wiki owners angry at
us.

(2) There's no way currently to limit how many documents to crawl.
Basically you get the whole wiki.  I'd like to hear ideas of ways we
could specify fewer documents, consistent with the way people intend
to use the connector.

(3) Because there's no ability to crawl anything less than the entire
wiki, and because the only wiki I've tried so far is the public one,
I've not actually debugged beyond the document discover phase.  If
anyone can point me a small public wiki that I could crawl against,
that would be great.  I'm writing unit tests but those are not going
to be ready for a couple of days.

Karl