You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Hannes Carl Meyer <ha...@googlemail.com> on 2011/01/06 17:15:01 UTC

Re: If-Modified-Since header with Nutch

Hi,

did you solve the problem yourself?
I'm running in the same Issue...

Maybe someone else could help here?

Regards

Hannes

On Wed, Oct 27, 2010 at 12:28 PM, Davide Cavalaglio <
davide.cavalaglio@desktopsrl.com> wrote:

> Hi,
> i have problem with the option If-Modified-Since with Nutch.
> I want crawl on a web syte every day, so i have in nutch-site.html the
> right setting of property db.fetch.interval.default.
> But i want to limit Nutch to fetch only page that changed using the
> If-Modified-Since header.
>
> I found some resources on web to do this task, but when i recrawl page
> afeter fetch-interval, nutch download all pages. I use Nutch 1.0 whith
> protocol http. I don't use Adaptive Scheduler. In HttpResponse.java i
> added the code:
> if (datum.getModifiedTime() > 0) {
>           String httpDate =
> HttpDateFormat.toString(datum.getModifiedTime());
>           Http.LOG.debug("modified time: " + httpDate);
>           reqStr.append("If-Modified-Since: " + httpDate);
>           reqStr.append("\r\n");
>       }
>       else if (datum.getFetchTime() > 0) {
>          String httpDate = HttpDateFormat.toString(datum.getFetchTime());
>          Http.LOG.debug("modified time: " + httpDate);
>          reqStr.append("If-Modified-Since: " + httpDate);
>          reqStr.append("\r\n");
>       }
>
>       reqStr.append("\r\n");
>
> because there was a bug that prevent the use of If-Modified-Since.
> Also i did other change to Fetcher.java so i have the correct value of
> LastModified in the CrawlDb
> I try to crawl other web site because i want understand if it is a
> problem of my web server that not support if-modified-since. But in
> every test, i have always response code 200 even if the lastModified
> of web page is older than LastModified in CrawlDb.
>
> Can anyone tell me how to correctly use the If-Modified-Since?
> Thanks,
> Cavalaglio Davide
>