You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by EM <em...@cpuedge.com> on 2005/09/05 12:25:33 UTC

RE: [Nutch-general] DMOZ Web coverage

Dmoz isn't big (not small either). 5-10% of my target sites are under dmoz.
Dmoz is a nice starting point for large crawls.

-----Original Message-----
From: ogjunk-nutch@yahoo.com [mailto:ogjunk-nutch@yahoo.com] 
Sent: Wednesday, August 31, 2005 5:13 PM
To: user@nutch.org
Subject: Re: [Nutch-general] DMOZ Web coverage

I imagine the only people who can answer this question are those who
have crawled laaaaarge portion of the Web (i.e. Google, Yahoo,
Teoma...), and I don't think they'll care to share :(

Otis

--- Chetan Sahasrabudhe <Ch...@KPITCummins.com> wrote:

> Hello,
> 
>     I am trying to figure out how much web coverage is achievable
> through dmoz file ?
> In case I want to crawl whole web how much time would it take and
> what shall be the approach for the same.
> 
> Parameters I am interested in are,
> 
> 1. Size of whole web index.
> 2. Time for generating whole web index.
> 3. How much web coverage does dmoz file provides.
> 
> Regards
> Chetan
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle
> Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing
> & QA
> Security * Process Improvement & Measurement *
> http://www.sqe.com/bsce5sf
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-general
>