You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by matty2012 <mt...@usa.com> on 2011/09/01 05:13:30 UTC

Nutch 1.3 and Hadoop config

I am an newbie to Nutch and Hadoop.

I am trying to follow the tutorial here at
http://wiki.apache.org/nutch/NutchHadoopTutorial.

I got Nutch 1.3 release.

Even though Hadoop is included in Nutch, I did not see any of these .sh or
.xml files referred in the tutorial under /nutch/search/conf after the
build.

I was wondering if I have to setup hadoop first in the same directory
structure or copy over hadoop config files before proceeding to Nutch setup.

Can anyone please put me in the right direction. I am pretty sure that I am
lost :-(

THanks in advance


--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-3-and-Hadoop-config-tp3300212p3300212.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 1.3 and Hadoop config

Posted by Markus Jelsma <ma...@openindex.io>.
In 1.4-dev you'll only need to have the hadoop executable on the path.
https://issues.apache.org/jira/browse/NUTCH-1085

On Thursday 01 September 2011 15:10:47 Ferdy Galema wrote:
> The guide is a bit outdated I guess. Here's what I know:
> 
> There are basically two modes to run Nutch, distributed and local. If
> you build Nutch, there are two folders in 'runtime', 'deploy' and
> 'local' for respectively distributed and local mode. Running distributed
> requires an hadoop deployment, which is not included in Nutch anymore.
> You need to separately install it, set HADOOP_HOME to it and you can
> submit jobs to it. Running Nutch distributed is recommended when you
> plan on running big and scalable crawls. If you just want to run some
> test or otherwise small crawls, running local will be perfectly fine.
> 
> On 09/01/2011 05:13 AM, matty2012 wrote:
> > I am an newbie to Nutch and Hadoop.
> > 
> > I am trying to follow the tutorial here at
> > http://wiki.apache.org/nutch/NutchHadoopTutorial.
> > 
> > I got Nutch 1.3 release.
> > 
> > Even though Hadoop is included in Nutch, I did not see any of these .sh
> > or .xml files referred in the tutorial under /nutch/search/conf after
> > the build.
> > 
> > I was wondering if I have to setup hadoop first in the same directory
> > structure or copy over hadoop config files before proceeding to Nutch
> > setup.
> > 
> > Can anyone please put me in the right direction. I am pretty sure that I
> > am lost :-(
> > 
> > THanks in advance
> > 
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Nutch-1-3-and-Hadoop-config-tp3300212
> > p3300212.html Sent from the Nutch - User mailing list archive at
> > Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Nutch 1.3 and Hadoop config

Posted by Julien Nioche <li...@gmail.com>.
http://wiki.apache.org/nutch/NutchHadoopTutorial is outdated and has
disappeared from the main page of the wiki

http://wiki.apache.org/nutch/RunningNutchAndSolr is the most up to date
howto but shows how to run Nutch in local mode.

There is http://wiki.apache.org/nutch/RunningNutchInDeployMode which should
illustrate how to use Nutch on an existing Hadoop cluster but it is empty -
you are more than welcome to contribute to it if you want to.

Julien

On 1 September 2011 15:52, matty2012 <mt...@usa.com> wrote:

> OK..thank you.
>
> I was under the impression that Hadoop is distributed as a part of Nutch.
> May be I misread the nutch wiki.
>
> Anyways, I will setup hadoop first and then try again.
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-1-3-and-Hadoop-config-tp3300212p3301533.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: Nutch 1.3 and Hadoop config

Posted by matty2012 <mt...@usa.com>.
OK..thank you.

I was under the impression that Hadoop is distributed as a part of Nutch.
May be I misread the nutch wiki.

Anyways, I will setup hadoop first and then try again.


--
View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-3-and-Hadoop-config-tp3300212p3301533.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch 1.3 and Hadoop config

Posted by Ferdy Galema <fe...@kalooga.com>.
The guide is a bit outdated I guess. Here's what I know:

There are basically two modes to run Nutch, distributed and local. If 
you build Nutch, there are two folders in 'runtime', 'deploy' and 
'local' for respectively distributed and local mode. Running distributed 
requires an hadoop deployment, which is not included in Nutch anymore. 
You need to separately install it, set HADOOP_HOME to it and you can 
submit jobs to it. Running Nutch distributed is recommended when you 
plan on running big and scalable crawls. If you just want to run some 
test or otherwise small crawls, running local will be perfectly fine.

On 09/01/2011 05:13 AM, matty2012 wrote:
> I am an newbie to Nutch and Hadoop.
>
> I am trying to follow the tutorial here at
> http://wiki.apache.org/nutch/NutchHadoopTutorial.
>
> I got Nutch 1.3 release.
>
> Even though Hadoop is included in Nutch, I did not see any of these .sh or
> .xml files referred in the tutorial under /nutch/search/conf after the
> build.
>
> I was wondering if I have to setup hadoop first in the same directory
> structure or copy over hadoop config files before proceeding to Nutch setup.
>
> Can anyone please put me in the right direction. I am pretty sure that I am
> lost :-(
>
> THanks in advance
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-3-and-Hadoop-config-tp3300212p3300212.html
> Sent from the Nutch - User mailing list archive at Nabble.com.