You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jeremy Villalobos <je...@gmail.com> on 2012/03/01 22:26:00 UTC
multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred
Hello:
I am running multiple small crawls on one machine. I notice that they are
conflicting because they all access
/tmp/hadoop-username/mapred
How do I change the location of this folder ?
Do I have use hadoop to run multiple crawlers each specific to a site ?
thanks
Jeremy
Re: multiple small crawlers on single machine conflict at
/tmp/hadoop-username/mapred
Posted by Markus Jelsma <ma...@openindex.io>.
You can also pass it to most jobs with $ nutch <job>
-Dhadoop.tmp.dir=bla args. This can be even automatic with some shell
scripting.
On Fri, 2 Mar 2012 00:49:36 -0500, Jeremy Villalobos
<je...@gmail.com> wrote:
> It is a small number of crawlers, so I copied a runtime for each.
> therefore different configuration files.
>
> Jeremy
>
> On Thu, Mar 1, 2012 at 10:57 PM, remi tassing wrote:
> How did you define that property so it's different so each job?
>
> Remi
>
> On Friday, March 2, 2012, Jeremy Villalobos
> wrote:
>
>> That is what I was looking for, thank you.
> >
> > this property was added to:
> > $NUCHT_DIR/runtime/local/conf/nutch-site.xml
> >
> > Jeremy
> >
> > On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma wrote:
> >
> >> you can either:
> >>
> >> 1. run on hadoop
> >> 2. not run multiple concurrent jobs on a local machine
> >> 3. set a hadoop.tmp.dir per job
> >> 4. merge all crawls to a single crawl
> >>
> >>
> >> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
> >> jeremyvillalobos@gmail.com [4]> wrote:
> >>
> >>> Hello:
> >>>
> >>> I am running multiple small crawls on one machine. I notice
> that they
> are
> >>> conflicting because they all access
> >>>
> >>> /tmp/hadoop-username/mapred
> >>>
> >>> How do I change the location of this folder ?
> >>>
> >>> Do I have use hadoop to run multiple crawlers each specific to a
> site ?
> >>>
> >>> thanks
> >>>
> >>> Jeremy
> >>>
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >> http://www.linkedin.com/in/**markus17 [5]
> >> 050-8536600 / 06-50258350
> >>
> >
>
>
>
> Links:
> ------
> [1] mailto:tassingremi@gmail.com
> [2] mailto:jeremyvillalobos@gmail.com
> [3] mailto:markus.jelsma@openindex.io
> [4] mailto:jeremyvillalobos@gmail.com
> [5] http://www.linkedin.com/in/**markus17
> [6] http://www.linkedin.com/in/markus17
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350
Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred
Posted by Jeremy Villalobos <je...@gmail.com>.
It is a small number of crawlers, so I copied a runtime for each.
therefore different configuration files.
Jeremy
On Thu, Mar 1, 2012 at 10:57 PM, remi tassing <ta...@gmail.com> wrote:
> How did you define that property so it's different so each job?
>
> Remi
>
> On Friday, March 2, 2012, Jeremy Villalobos <je...@gmail.com>
> wrote:
> > That is what I was looking for, thank you.
> >
> > this property was added to:
> > $NUCHT_DIR/runtime/local/conf/nutch-site.xml
> >
> > Jeremy
> >
> > On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma <
> markus.jelsma@openindex.io
> >wrote:
> >
> >> you can either:
> >>
> >> 1. run on hadoop
> >> 2. not run multiple concurrent jobs on a local machine
> >> 3. set a hadoop.tmp.dir per job
> >> 4. merge all crawls to a single crawl
> >>
> >>
> >> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
> >> jeremyvillalobos@gmail.com> wrote:
> >>
> >>> Hello:
> >>>
> >>> I am running multiple small crawls on one machine. I notice that they
> are
> >>> conflicting because they all access
> >>>
> >>> /tmp/hadoop-username/mapred
> >>>
> >>> How do I change the location of this folder ?
> >>>
> >>> Do I have use hadoop to run multiple crawlers each specific to a site ?
> >>>
> >>> thanks
> >>>
> >>> Jeremy
> >>>
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >> http://www.linkedin.com/in/**markus17<
> http://www.linkedin.com/in/markus17
> >
> >> 050-8536600 / 06-50258350
> >>
> >
>
Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred
Posted by remi tassing <ta...@gmail.com>.
How did you define that property so it's different so each job?
Remi
On Friday, March 2, 2012, Jeremy Villalobos <je...@gmail.com>
wrote:
> That is what I was looking for, thank you.
>
> this property was added to:
> $NUCHT_DIR/runtime/local/conf/nutch-site.xml
>
> Jeremy
>
> On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma <markus.jelsma@openindex.io
>wrote:
>
>> you can either:
>>
>> 1. run on hadoop
>> 2. not run multiple concurrent jobs on a local machine
>> 3. set a hadoop.tmp.dir per job
>> 4. merge all crawls to a single crawl
>>
>>
>> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
>> jeremyvillalobos@gmail.com> wrote:
>>
>>> Hello:
>>>
>>> I am running multiple small crawls on one machine. I notice that they
are
>>> conflicting because they all access
>>>
>>> /tmp/hadoop-username/mapred
>>>
>>> How do I change the location of this folder ?
>>>
>>> Do I have use hadoop to run multiple crawlers each specific to a site ?
>>>
>>> thanks
>>>
>>> Jeremy
>>>
>>
>> --
>> Markus Jelsma - CTO - Openindex
>> http://www.linkedin.com/in/**markus17<http://www.linkedin.com/in/markus17
>
>> 050-8536600 / 06-50258350
>>
>
Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred
Posted by Jeremy Villalobos <je...@gmail.com>.
That is what I was looking for, thank you.
this property was added to:
$NUCHT_DIR/runtime/local/conf/nutch-site.xml
Jeremy
On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma <ma...@openindex.io>wrote:
> you can either:
>
> 1. run on hadoop
> 2. not run multiple concurrent jobs on a local machine
> 3. set a hadoop.tmp.dir per job
> 4. merge all crawls to a single crawl
>
>
> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
> jeremyvillalobos@gmail.com> wrote:
>
>> Hello:
>>
>> I am running multiple small crawls on one machine. I notice that they are
>> conflicting because they all access
>>
>> /tmp/hadoop-username/mapred
>>
>> How do I change the location of this folder ?
>>
>> Do I have use hadoop to run multiple crawlers each specific to a site ?
>>
>> thanks
>>
>> Jeremy
>>
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/**markus17<http://www.linkedin.com/in/markus17>
> 050-8536600 / 06-50258350
>
Re: multiple small crawlers on single machine conflict at
/tmp/hadoop-username/mapred
Posted by Markus Jelsma <ma...@openindex.io>.
you can either:
1. run on hadoop
2. not run multiple concurrent jobs on a local machine
3. set a hadoop.tmp.dir per job
4. merge all crawls to a single crawl
On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos
<je...@gmail.com> wrote:
> Hello:
>
> I am running multiple small crawls on one machine. I notice that
> they are
> conflicting because they all access
>
> /tmp/hadoop-username/mapred
>
> How do I change the location of this folder ?
>
> Do I have use hadoop to run multiple crawlers each specific to a site
> ?
>
> thanks
>
> Jeremy
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350