You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Adamantios Corais <ad...@gmail.com> on 2015/03/22 15:35:50 UTC
How to configure seed and urlfilter confg files in Apache Nutch
I would like to setup Nutch so that it goes through all
http://www.domain.com/classifieds/something/?pg=<page> pages, for goes from
1 to 200 and store the urls of the form
http://www.domain.com/classifieds/something/view/<number>/ where is a ling
number? Then, I would like print out all these urls in my terminal. I am
using Apache Nutch 1.9 and Apache Solr 4.10.4.
*// Adamantios*
Re: How to configure seed and urlfilter confg files in Apache Nutch
Posted by Siddharth Shah <ia...@gmail.com>.
Hello,
I think you might need to get rid of following line in
your conf/regex-urlfilter.txt, else when injecting seed URLs they will be
filtered out.
# skip URLs containing certain characters as probable queries, etc.
-[?*!@=]
Give it a try and let me know if this works.
Thank you,
Sidharth
On Mon, Mar 23, 2015 at 3:58 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:
> Apologize for insisting but any help would be highly appreciated since I am
> newbie to Appache Nutch. Thank you!
>
>
> *// Adamantios*
>
>
>
> On Sun, Mar 22, 2015 at 4:35 PM, Adamantios Corais <
> adamantios.corais@gmail.com> wrote:
>
> > I would like to setup Nutch so that it goes through all
> > http://www.domain.com/classifieds/something/?pg=<page> pages, for goes
> > from 1 to 200 and store the urls of the form
> > http://www.domain.com/classifieds/something/view/<number>/ where is a
> > ling number? Then, I would like print out all these urls in my terminal.
> I
> > am using Apache Nutch 1.9 and Apache Solr 4.10.4.
> >
> >
> > *// Adamantios*
> >
> >
> >
>
Re: How to configure seed and urlfilter confg files in Apache Nutch
Posted by Adamantios Corais <ad...@gmail.com>.
Apologize for insisting but any help would be highly appreciated since I am
newbie to Appache Nutch. Thank you!
*// Adamantios*
On Sun, Mar 22, 2015 at 4:35 PM, Adamantios Corais <
adamantios.corais@gmail.com> wrote:
> I would like to setup Nutch so that it goes through all
> http://www.domain.com/classifieds/something/?pg=<page> pages, for goes
> from 1 to 200 and store the urls of the form
> http://www.domain.com/classifieds/something/view/<number>/ where is a
> ling number? Then, I would like print out all these urls in my terminal. I
> am using Apache Nutch 1.9 and Apache Solr 4.10.4.
>
>
> *// Adamantios*
>
>
>