You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2015/02/22 05:08:16 UTC

[Nutch Wiki] Update of "Nutch_1.X_RESTAPI" by SujenShah

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "Nutch_1.X_RESTAPI" page has been changed by SujenShah:
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI

New page:
= Nutch 1.x REST API =

<<TableOfContents(4)>>

== Introduction ==
This page documents the Nutch 1.X REST API. 

It provides details on the type of REST calls which can be made to the Nutch 1.x REST API. Many of the API points are adapted from the ones provided by the  [[https://wiki.apache.org/nutch/NutchRESTAPI|Nutch 2.x REST API]]. One of the reasons to come up with a REST API is to integrate D3 to show visualizations about the working of a Nutch crawl. 


== REST API Calls ==
=== Administration ===
This API point is created in order to get server status and manage server's state.
==== Get server status ====
{{{{
GET /admin
}}}}

__Response__ contains server startup date, availible configuration names, job history and currently running jobs.
{{{{
{
   "startDate":1424572500000,
   "configuration":[
      "default"
   ],
   "jobs":[

   ],
   "runningJobs":[

   ]
}
}}}}

==== Stop server ====
It is possible to stop running server using ''/admin/stop''.
{{{{
GET /admin/stop
}}}}

__Response__
{{{{
Stopping in 5 seconds.
}}}}

=== Jobs ===
This point allows job management, including creation, job information and killing of a job.
==== Listing all jobs ====
{{{{
GET /job
}}}}

__Response__ contains list of all jobs (running and history)
{{{{
[
   {
      "id":"job-id-5977",
      "type":"FETCH",
      "confId":"default",
      "args":null,
      "result":null,
      "state":"FINISHED",
      "msg":"",
      "crawlId":"crawl-01"
   }
   {
      "id":"job-id-5978",
      "type":"PARSE",
      "confId":"default",
      "args":null,
      "result":null,
      "state":"RUNNING",
      "msg":"",
      "crawlId":"crawl-01"
   }
]
}}}}

==== Get job info ====
{{{{
GET /job/job-id-5977
}}}}

__Response__
{{{{
   {
      "id":"job-id-5977",
      "type":"FETCH",
      "confId":"default",
      "args":null,
      "result":null,
      "state":"FINISHED",
      "msg":"",
      "crawlId":"crawl-01"
   }
}}}}

==== Stop job ====
{{{{
GET /job/job-id-5977/stop
}}}}

__Response__
{{{{
  true
}}}}


==== Kill job ====
{{{{
GET /job/job-id-5977/abort
}}}}

__Response__
{{{{
  true
}}}}

==== Create job ====
Create job with given parameters. You should either specify Job Type(like INJECT, GENERATE, FETCH, PARSE, etc ) or jobClassName.
{{{{
POST /job/create
   {
      "crawlId":"crawl-01",
      "type":"FETCH",
      "confId":"default",
      "args":{"someParam":"someValue"}
   }

POST /job/create
   {
      "crawlId":"crawl-01",
      "jobClassName":"org.apache.nutch.fetcher.FetcherJob"
      "confId":"default",
      "args":{"someParam":"someValue"}
   }
}}}}

__Response__ is created job's id.
{{{{
    job-id-43243
}}}}

=== URL ===

This point is created in order to get the required information about a URL or list of URLs to generate a D3 visualization. The information obtained from this API point will help 
{{{{
GET /url/{filtered-url}
}}}}
__Response__ contains information about the url from the CrawlDbReader.java class. The parameters are
{{{{
   {
      "url" : "",
      "statusCode" : "",
      "fetchTime" : "",
      "score" : "",
      "numOfInlinks" : "",
      "numOfOutlinks" : "",
   }
}}}}

== More ==
Description of more API points coming soon.