You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by lewis john mcgibbney <le...@gmail.com> on 2011/06/26 05:18:10 UTC
Nutch Gotchas as of release 1.3
Hello list,
Do we have any suggestions we wish to discuss regarding the above?
thanks
--
*Lewis*
Re: Nutch Gotchas as of release 1.3
Posted by Julien Nioche <li...@gmail.com>.
Great, thanks!
On 12 July 2011 10:56, lewis john mcgibbney <le...@gmail.com>wrote:
> Hi
>
> I have duly updated both the Nutch Gotchas [1] and the tutorial [2] to
> incorporate these gotchas which have been highlighted. Thanks for pointing
> these out.
>
> [1] http://wiki.apache.org/nutch/NutchGotchas
> [2] http://wiki.apache.org/nutch/RunningNutchAndSolr
>
> On Tue, Jul 12, 2011 at 12:03 AM, Jerry E. Craig, Jr. <
> jcraig@inforeverse.com> wrote:
>
> > Just from a total noob standpoint (just installed my first LAMP box over
> > the last month) realizing that I needed to look in the Runtime folder
> when I
> > downloaded the tar.gz file was a HUGE step.
> >
> > Then we all run the Crawl at least to make sure things work. The main
> > tutorial was missing the [-solr] part of the crawl command line to get
> that
> > to index. It wasn't after someone helped me here and pointed me to the
> > actual documents that I found it.
> >
> > Those were the 2 big things for me as a total noob, otherwise I'm really
> > happy to have at least that part working. Now, my stupid CentOS install
> > only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions
> > off on libcurl also. I have NO idea how to go back and fix that. Not
> sure
> > if I should just try to upgrade to php53 and hope for the best or what.
> > But, that's more of a solr / php question than a Nutch question I think.
> >
> >
> > -----Original Message-----
> > From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> > Sent: Monday, July 11, 2011 3:19 PM
> > To: user@nutch.apache.org
> > Cc: lewis john mcgibbney
> > Subject: Re: Nutch Gotchas as of release 1.3
> >
> > Well, now i'm thinking of it: yes.
> >
> > - there were three (incl. myself) people mentioning the problem described
> > in NUTCH-1016;
> > - a few users don't seem to catch the part of the tutorial telling them
> to
> > add their robot to the config
> > - missing crawl-urlfilter
> > - mails about missing solrUrl
> >
> > I think quite a few users still rely on the crawl command instead of
> > running a script.
> >
> > > Hello list,
> > >
> > > Do we have any suggestions we wish to discuss regarding the above?
> > >
> > > thanks
> >
>
>
>
> --
> *Lewis*
>
--
*
*Open Source Solutions for Text Engineering
http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
Re: Nutch Gotchas as of release 1.3
Posted by lewis john mcgibbney <le...@gmail.com>.
Hi
I have duly updated both the Nutch Gotchas [1] and the tutorial [2] to
incorporate these gotchas which have been highlighted. Thanks for pointing
these out.
[1] http://wiki.apache.org/nutch/NutchGotchas
[2] http://wiki.apache.org/nutch/RunningNutchAndSolr
On Tue, Jul 12, 2011 at 12:03 AM, Jerry E. Craig, Jr. <
jcraig@inforeverse.com> wrote:
> Just from a total noob standpoint (just installed my first LAMP box over
> the last month) realizing that I needed to look in the Runtime folder when I
> downloaded the tar.gz file was a HUGE step.
>
> Then we all run the Crawl at least to make sure things work. The main
> tutorial was missing the [-solr] part of the crawl command line to get that
> to index. It wasn't after someone helped me here and pointed me to the
> actual documents that I found it.
>
> Those were the 2 big things for me as a total noob, otherwise I'm really
> happy to have at least that part working. Now, my stupid CentOS install
> only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions
> off on libcurl also. I have NO idea how to go back and fix that. Not sure
> if I should just try to upgrade to php53 and hope for the best or what.
> But, that's more of a solr / php question than a Nutch question I think.
>
>
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> Sent: Monday, July 11, 2011 3:19 PM
> To: user@nutch.apache.org
> Cc: lewis john mcgibbney
> Subject: Re: Nutch Gotchas as of release 1.3
>
> Well, now i'm thinking of it: yes.
>
> - there were three (incl. myself) people mentioning the problem described
> in NUTCH-1016;
> - a few users don't seem to catch the part of the tutorial telling them to
> add their robot to the config
> - missing crawl-urlfilter
> - mails about missing solrUrl
>
> I think quite a few users still rely on the crawl command instead of
> running a script.
>
> > Hello list,
> >
> > Do we have any suggestions we wish to discuss regarding the above?
> >
> > thanks
>
--
*Lewis*
RE: Nutch Gotchas as of release 1.3
Posted by "Jerry E. Craig, Jr." <jc...@inforeverse.com>.
Just from a total noob standpoint (just installed my first LAMP box over the last month) realizing that I needed to look in the Runtime folder when I downloaded the tar.gz file was a HUGE step.
Then we all run the Crawl at least to make sure things work. The main tutorial was missing the [-solr] part of the crawl command line to get that to index. It wasn't after someone helped me here and pointed me to the actual documents that I found it.
Those were the 2 big things for me as a total noob, otherwise I'm really happy to have at least that part working. Now, my stupid CentOS install only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions off on libcurl also. I have NO idea how to go back and fix that. Not sure if I should just try to upgrade to php53 and hope for the best or what. But, that's more of a solr / php question than a Nutch question I think.
-----Original Message-----
From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
Sent: Monday, July 11, 2011 3:19 PM
To: user@nutch.apache.org
Cc: lewis john mcgibbney
Subject: Re: Nutch Gotchas as of release 1.3
Well, now i'm thinking of it: yes.
- there were three (incl. myself) people mentioning the problem described in NUTCH-1016;
- a few users don't seem to catch the part of the tutorial telling them to add their robot to the config
- missing crawl-urlfilter
- mails about missing solrUrl
I think quite a few users still rely on the crawl command instead of running a script.
> Hello list,
>
> Do we have any suggestions we wish to discuss regarding the above?
>
> thanks
Re: Nutch Gotchas as of release 1.3
Posted by Markus Jelsma <ma...@openindex.io>.
Well, now i'm thinking of it: yes.
- there were three (incl. myself) people mentioning the problem described in
NUTCH-1016;
- a few users don't seem to catch the part of the tutorial telling them to add
their robot to the config
- missing crawl-urlfilter
- mails about missing solrUrl
I think quite a few users still rely on the crawl command instead of running a
script.
> Hello list,
>
> Do we have any suggestions we wish to discuss regarding the above?
>
> thanks