You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/10/16 18:59:48 UTC
nutch-python
Hey Folks,
My team at JPL &USC and Continuum Analytics have been building a
Python-based interface to Nutch that uses the REST API.
It’s pretty much done in its initial version:
http://github.com/chrismattmann/nutch-python/
We even have a bin/crawl like functionality, crawl.py, here:
https://github.com/chrismattmann/nutch-python/tree/master/nutch/crawl.py
README is here:
https://github.com/chrismattmann/nutch-python/tree/master/nutch
Feedback is welcomed! Installation is simple as:
pip install nutch
ALv2 licensed! Pull Requests and help welcomed!
Enjoy!
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++