You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Julien Nioche <li...@gmail.com> on 2013/10/11 20:20:22 UTC
[ANNOUNCEMENT] 0.3 release of crawler-commons
Hi,
Just to let you know that we have just release the version 0.3 of
crawler-commons. Crawler-commons is a set of reusable Java components that
implement functionality common to any web crawler. These components benefit
from collaboration among various existing web crawler projects, and reduce
duplication of effort. The main components are parsers for robots.txt,
sitemap files, domain utilities and fetchers.
Crawler-commons is used in Bixo and Apache Nutch for parsing robots.txt
files.
*Project* -> https://code.google.com/p/crawler-commons/
*Release notes* ->
http://crawler-commons.googlecode.com/svn/tags/crawler-commons-0.3/CHANGES.txt
*Info about artifacts* ->
http://search.maven.org/#artifactdetails|com.google.code.crawler-commons|crawler-commons|0.3|jar
Thanks!
Julien
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble