You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "pesmadhu ." <pe...@gmail.com> on 2016/04/06 11:42:28 UTC

Apache Nutch : query

Hi,

   We have a requirement to scrape the urls data which contains table data,
we need to read the table content and depending on some column value of
table data we need to download the file.

Example urls : http://exporter.nih.gov/ExPORTER_Catalog.aspx

http://exporter.nih.gov/ExPORTER_Catalog.aspx?sid=3&index=0

http://exporter.nih.gov/ExPORTER_Catalog.aspx?sid=0&index=1


Please check and suggest can we achieve this using Apache Nutch.

I have one more query, what is the main usage of Apache Nutch.

-- 
Regards,
Madhusudhan.