You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by lewis john mcgibbney <le...@gmail.com> on 2011/06/26 05:18:10 UTC

Nutch Gotchas as of release 1.3

Hello list,

Do we have any suggestions we wish to discuss regarding the above?

thanks

-- 
*Lewis*

Re: Nutch Gotchas as of release 1.3

Posted by Julien Nioche <li...@gmail.com>.
Great, thanks!

On 12 July 2011 10:56, lewis john mcgibbney <le...@gmail.com>wrote:

> Hi
>
> I have duly updated both the Nutch Gotchas [1] and the tutorial [2] to
> incorporate these gotchas which have been highlighted. Thanks for pointing
> these out.
>
> [1] http://wiki.apache.org/nutch/NutchGotchas
> [2] http://wiki.apache.org/nutch/RunningNutchAndSolr
>
> On Tue, Jul 12, 2011 at 12:03 AM, Jerry E. Craig, Jr. <
> jcraig@inforeverse.com> wrote:
>
> > Just from a total noob standpoint (just installed my first LAMP box over
> > the last month) realizing that I needed to look in the Runtime folder
> when I
> > downloaded the tar.gz file was a HUGE step.
> >
> > Then we all run the Crawl at least to make sure things work.  The main
> > tutorial was missing the [-solr] part of the crawl command line to get
> that
> > to index.  It wasn't after someone helped me here and pointed me to the
> > actual documents that I found it.
> >
> > Those were the 2 big things for me as a total noob, otherwise I'm really
> > happy to have at least that part working.  Now, my stupid CentOS install
> > only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions
> > off on libcurl also.  I have NO idea how to go back and fix that.  Not
> sure
> > if I should just try to upgrade to php53 and hope for the best or what.
> >  But, that's more of a solr / php question than a Nutch question I think.
> >
> >
> > -----Original Message-----
> > From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> > Sent: Monday, July 11, 2011 3:19 PM
> > To: user@nutch.apache.org
> > Cc: lewis john mcgibbney
> > Subject: Re: Nutch Gotchas as of release 1.3
> >
> > Well, now i'm thinking of it: yes.
> >
> > - there were three (incl. myself) people mentioning the problem described
> > in NUTCH-1016;
> > - a few users don't seem to catch the part of the tutorial telling them
> to
> > add their robot to the config
> > - missing crawl-urlfilter
> > - mails about missing solrUrl
> >
> > I think quite a few users still rely on the crawl command instead of
> > running a script.
> >
> > > Hello list,
> > >
> > > Do we have any suggestions we wish to discuss regarding the above?
> > >
> > > thanks
> >
>
>
>
> --
> *Lewis*
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Nutch Gotchas as of release 1.3

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi

I have duly updated both the Nutch Gotchas [1] and the tutorial [2] to
incorporate these gotchas which have been highlighted. Thanks for pointing
these out.

[1] http://wiki.apache.org/nutch/NutchGotchas
[2] http://wiki.apache.org/nutch/RunningNutchAndSolr

On Tue, Jul 12, 2011 at 12:03 AM, Jerry E. Craig, Jr. <
jcraig@inforeverse.com> wrote:

> Just from a total noob standpoint (just installed my first LAMP box over
> the last month) realizing that I needed to look in the Runtime folder when I
> downloaded the tar.gz file was a HUGE step.
>
> Then we all run the Crawl at least to make sure things work.  The main
> tutorial was missing the [-solr] part of the crawl command line to get that
> to index.  It wasn't after someone helped me here and pointed me to the
> actual documents that I found it.
>
> Those were the 2 big things for me as a total noob, otherwise I'm really
> happy to have at least that part working.  Now, my stupid CentOS install
> only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions
> off on libcurl also.  I have NO idea how to go back and fix that.  Not sure
> if I should just try to upgrade to php53 and hope for the best or what.
>  But, that's more of a solr / php question than a Nutch question I think.
>
>
> -----Original Message-----
> From: Markus Jelsma [mailto:markus.jelsma@openindex.io]
> Sent: Monday, July 11, 2011 3:19 PM
> To: user@nutch.apache.org
> Cc: lewis john mcgibbney
> Subject: Re: Nutch Gotchas as of release 1.3
>
> Well, now i'm thinking of it: yes.
>
> - there were three (incl. myself) people mentioning the problem described
> in NUTCH-1016;
> - a few users don't seem to catch the part of the tutorial telling them to
> add their robot to the config
> - missing crawl-urlfilter
> - mails about missing solrUrl
>
> I think quite a few users still rely on the crawl command instead of
> running a script.
>
> > Hello list,
> >
> > Do we have any suggestions we wish to discuss regarding the above?
> >
> > thanks
>



-- 
*Lewis*

RE: Nutch Gotchas as of release 1.3

Posted by "Jerry E. Craig, Jr." <jc...@inforeverse.com>.
Just from a total noob standpoint (just installed my first LAMP box over the last month) realizing that I needed to look in the Runtime folder when I downloaded the tar.gz file was a HUGE step. 

Then we all run the Crawl at least to make sure things work.  The main tutorial was missing the [-solr] part of the crawl command line to get that to index.  It wasn't after someone helped me here and pointed me to the actual documents that I found it.

Those were the 2 big things for me as a total noob, otherwise I'm really happy to have at least that part working.  Now, my stupid CentOS install only has libxml2 2.6.15 and I need 2.6.17 for php and I'm a few revisions off on libcurl also.  I have NO idea how to go back and fix that.  Not sure if I should just try to upgrade to php53 and hope for the best or what.  But, that's more of a solr / php question than a Nutch question I think.


-----Original Message-----
From: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Sent: Monday, July 11, 2011 3:19 PM
To: user@nutch.apache.org
Cc: lewis john mcgibbney
Subject: Re: Nutch Gotchas as of release 1.3

Well, now i'm thinking of it: yes.

- there were three (incl. myself) people mentioning the problem described in NUTCH-1016;
- a few users don't seem to catch the part of the tutorial telling them to add their robot to the config
- missing crawl-urlfilter
- mails about missing solrUrl

I think quite a few users still rely on the crawl command instead of running a script.

> Hello list,
> 
> Do we have any suggestions we wish to discuss regarding the above?
> 
> thanks

Re: Nutch Gotchas as of release 1.3

Posted by Markus Jelsma <ma...@openindex.io>.
Well, now i'm thinking of it: yes.

- there were three (incl. myself) people mentioning the problem described in 
NUTCH-1016;
- a few users don't seem to catch the part of the tutorial telling them to add 
their robot to the config
- missing crawl-urlfilter
- mails about missing solrUrl

I think quite a few users still rely on the crawl command instead of running a 
script.

> Hello list,
> 
> Do we have any suggestions we wish to discuss regarding the above?
> 
> thanks