You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Savannah Beckett <sa...@yahoo.com> on 2010/08/05 08:02:53 UTC

why doesn't nutch fetch any job links?

I am trying to get nutch to fetch all the job links in the following link, but 
it never does even though it fetches other links in the following link.
http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall&AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376&ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0&DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=engineer&WHERE=


I used all default setting except that I set it to fetch internal link and also 
external link.  Does anyone know why?
Thanks.


      

Re: why doesn't nutch fetch any job links?

Posted by Savannah Beckett <sa...@yahoo.com>.
I already made sure it can fetch "?" in the filter.  It can fetch all the links 
on the left sidebar, and all of them have "?" as url.  I also made sure that it 
can fetch unlimited outlinks.  Any more suggestions?  





________________________________
From: Alex McLintock <al...@gmail.com>
To: user@nutch.apache.org
Sent: Thu, August 5, 2010 12:03:51 AM
Subject: Re: why doesn't nutch fetch any job links?

Have you checked the regular expression filters? I believe that by
default it excludes anything with a '?' in the name because that
implies parameters - which may be unecessary.

Of course for you they presumably are necessary.

Alex


On 5 August 2010 07:02, Savannah Beckett <sa...@yahoo.com> wrote:
> I am trying to get nutch to fetch all the job links in the following link, but
> it never does even though it fetches other links in the following link.
>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall&AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376&ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0&DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=engineer&WHERE=
>=
>
>
> I used all default setting except that I set it to fetch internal link and 
also
> external link.  Does anyone know why?
> Thanks.
>
>
>



      

Re: why doesn't nutch fetch any job links?

Posted by Alex McLintock <al...@gmail.com>.
Have you checked the regular expression filters? I believe that by
default it excludes anything with a '?' in the name because that
implies parameters - which may be unecessary.

Of course for you they presumably are necessary.

Alex


On 5 August 2010 07:02, Savannah Beckett <sa...@yahoo.com> wrote:
> I am trying to get nutch to fetch all the job links in the following link, but
> it never does even though it fetches other links in the following link.
> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=300&N=0&Hf=0&NUM_PER_PAGE=30&Ntk=JobSearchRanking&Ntx=mode+matchall&AREA_CODES=&AC_COUNTRY=1525&QUICK=1&ZIPCODE=&RADIUS=64.37376&ZC_COUNTRY=0&COUNTRY=1525&STAT_PROV=0&METRO_AREA=33.78715899%2C-84.39164034&TRAVEL=0&TAXTERM=0&SORTSPEC=0&FRMT=0&DAYSBACK=30&LOCATION_OPTION=2&FREE_TEXT=engineer&WHERE=
>
>
> I used all default setting except that I set it to fetch internal link and also
> external link.  Does anyone know why?
> Thanks.
>
>
>