You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by cesar voulgaris <ce...@gmail.com> on 2007/05/12 09:58:36 UTC

problem indexing by ip

hi to all, I'm trying to index pages iside my coutry. I set the
regex-urlfilter to crawl within my country domain (.uy).
The problem of coarse is that there are sites in the country not necesarily
with a URl ending in .uy
I tried to put a regular expression (even a single IP!!) in the
regex-urlfilter with IP in the range of my country (eg:
http://201.111.103.1/), the
crawl seems to work Ok but when I check the pages fetched (with readdb )
there is nothing, the db seems to be empty (it gives a null pointer
exception with the readdb command)

Can I set a pattern directly with IP's in the regex-urlfilter?
If no, then how  can I crawl in a  range of IPs?

Thanks in advance