You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cesar voulgaris <ce...@gmail.com> on 2007/05/12 09:58:36 UTC
problem indexing by ip
hi to all, I'm trying to index pages iside my coutry. I set the
regex-urlfilter to crawl within my country domain (.uy).
The problem of coarse is that there are sites in the country not necesarily
with a URl ending in .uy
I tried to put a regular expression (even a single IP!!) in the
regex-urlfilter with IP in the range of my country (eg:
http://201.111.103.1/), the
crawl seems to work Ok but when I check the pages fetched (with readdb )
there is nothing, the db seems to be empty (it gives a null pointer
exception with the readdb command)
Can I set a pattern directly with IP's in the regex-urlfilter?
If no, then how can I crawl in a range of IPs?
Thanks in advance