You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Julien Nioche <li...@gmail.com> on 2013/10/11 20:20:22 UTC

[ANNOUNCEMENT] 0.3 release of crawler-commons

Hi,

Just to let you know that we have just release the version 0.3 of
crawler-commons. Crawler-commons is a set of reusable Java components that
implement functionality common to any web crawler. These components benefit
from collaboration among various existing web crawler projects, and reduce
duplication of effort. The main components are parsers for robots.txt,
sitemap files, domain utilities and fetchers.

Crawler-commons is used in Bixo and Apache Nutch for parsing robots.txt
files.

 *Project* -> https://code.google.com/p/crawler-commons/

 *Release notes* ->
http://crawler-commons.googlecode.com/svn/tags/crawler-commons-0.3/CHANGES.txt

 *Info about artifacts* ->
http://search.maven.org/#artifactdetails|com.google.code.crawler-commons|crawler-commons|0.3|jar

Thanks!

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble