You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/10/16 18:59:48 UTC

nutch-python

Hey Folks,

My team at JPL &USC and Continuum Analytics have been building a
Python-based interface to Nutch that uses the REST API.

It’s pretty much done in its initial version:

http://github.com/chrismattmann/nutch-python/

We even have a bin/crawl like functionality, crawl.py, here:

https://github.com/chrismattmann/nutch-python/tree/master/nutch/crawl.py


README is here:

https://github.com/chrismattmann/nutch-python/tree/master/nutch


Feedback is welcomed! Installation is simple as:

pip install nutch

ALv2 licensed! Pull Requests and help welcomed!

Enjoy!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++