You are viewing a plain text version of this content. The canonical link for it is here.
Posted to droids-dev@incubator.apache.org by Julien Nioche <li...@gmail.com> on 2011/07/06 21:07:48 UTC

[ANN] Release crawler-commons 0.1

[Apologies for cross-posting]

The initial release of crawler-commons is available from :
http://code.google.com/p/crawler-commons/downloads/list

The purpose of this project is to develop a set of reusable Java components
that implement functionality common to any web crawler. These components
would benefit from collaboration among various existing web crawler
projects, and reduce duplication of effort.
The current version contains resources for :
- parsing robots.txt
- parsing sitemaps
- URL analyzer which returns Top Level Domains
- a simple HttpFetcher

This release is available on Sonatype's OSS Nexus repository [
https://oss.sonatype.org/content/repositories/releases/com/google/code/crawler-commons/]
and should be available on Maven Central soon.

Please send your questions, comments or suggestions to
http://groups.google.com/group/crawler-commons

Best regards,

Julien


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com