You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jeremy Villalobos <je...@gmail.com> on 2012/03/01 22:26:00 UTC

multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Hello:

I am running multiple small crawls on one machine.  I notice that they are
conflicting because they all access

/tmp/hadoop-username/mapred

How do I change the location of this folder ?

Do I have use hadoop to run multiple crawlers each specific to a site ?

thanks

Jeremy

Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Posted by Markus Jelsma <ma...@openindex.io>.
 You can also pass it to most jobs with $ nutch <job> 
 -Dhadoop.tmp.dir=bla args. This can be even automatic with some shell 
 scripting.


 On Fri, 2 Mar 2012 00:49:36 -0500, Jeremy Villalobos 
 <je...@gmail.com> wrote:
> It is a small number of crawlers, so I copied a runtime for each.
>  therefore different configuration files.
>
> Jeremy
>
> On Thu, Mar 1, 2012 at 10:57 PM, remi tassing  wrote:
>  How did you define that property so it's different so each job?
>
>  Remi
>
>  On Friday, March 2, 2012, Jeremy Villalobos
>  wrote:
>
>> That is what I was looking for, thank you.
>  >
>  > this property was added to:
>  > $NUCHT_DIR/runtime/local/conf/nutch-site.xml
>  >
>  > Jeremy
>  >
>  > On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma wrote:
>  >
>  >> you can either:
>  >>
>  >> 1. run on hadoop
>  >> 2. not run multiple concurrent jobs on a local machine
>  >> 3. set a hadoop.tmp.dir per job
>  >> 4. merge all crawls to a single crawl
>  >>
>  >>
>  >> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
>  >> jeremyvillalobos@gmail.com [4]> wrote:
>  >>
>  >>> Hello:
>  >>>
>  >>> I am running multiple small crawls on one machine.  I notice
> that they
>  are
>  >>> conflicting because they all access
>  >>>
>  >>> /tmp/hadoop-username/mapred
>  >>>
>  >>> How do I change the location of this folder ?
>  >>>
>  >>> Do I have use hadoop to run multiple crawlers each specific to a
> site ?
>  >>>
>  >>> thanks
>  >>>
>  >>> Jeremy
>  >>>
>  >>
>  >> --
>  >> Markus Jelsma - CTO - Openindex
>  >> http://www.linkedin.com/in/**markus17 [5]
>  >> 050-8536600 / 06-50258350
>  >>
>  >
>
>
>
> Links:
> ------
> [1] mailto:tassingremi@gmail.com
> [2] mailto:jeremyvillalobos@gmail.com
> [3] mailto:markus.jelsma@openindex.io
> [4] mailto:jeremyvillalobos@gmail.com
> [5] http://www.linkedin.com/in/**markus17
> [6] http://www.linkedin.com/in/markus17

-- 
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350

Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Posted by Jeremy Villalobos <je...@gmail.com>.
It is a small number of crawlers, so I copied a runtime for each.
 therefore different configuration files.

Jeremy

On Thu, Mar 1, 2012 at 10:57 PM, remi tassing <ta...@gmail.com> wrote:

> How did you define that property so it's different so each job?
>
> Remi
>
> On Friday, March 2, 2012, Jeremy Villalobos <je...@gmail.com>
> wrote:
> > That is what I was looking for, thank you.
> >
> > this property was added to:
> > $NUCHT_DIR/runtime/local/conf/nutch-site.xml
> >
> > Jeremy
> >
> > On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma <
> markus.jelsma@openindex.io
> >wrote:
> >
> >> you can either:
> >>
> >> 1. run on hadoop
> >> 2. not run multiple concurrent jobs on a local machine
> >> 3. set a hadoop.tmp.dir per job
> >> 4. merge all crawls to a single crawl
> >>
> >>
> >> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
> >> jeremyvillalobos@gmail.com> wrote:
> >>
> >>> Hello:
> >>>
> >>> I am running multiple small crawls on one machine.  I notice that they
> are
> >>> conflicting because they all access
> >>>
> >>> /tmp/hadoop-username/mapred
> >>>
> >>> How do I change the location of this folder ?
> >>>
> >>> Do I have use hadoop to run multiple crawlers each specific to a site ?
> >>>
> >>> thanks
> >>>
> >>> Jeremy
> >>>
> >>
> >> --
> >> Markus Jelsma - CTO - Openindex
> >> http://www.linkedin.com/in/**markus17<
> http://www.linkedin.com/in/markus17
> >
> >> 050-8536600 / 06-50258350
> >>
> >
>

Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Posted by remi tassing <ta...@gmail.com>.
How did you define that property so it's different so each job?

Remi

On Friday, March 2, 2012, Jeremy Villalobos <je...@gmail.com>
wrote:
> That is what I was looking for, thank you.
>
> this property was added to:
> $NUCHT_DIR/runtime/local/conf/nutch-site.xml
>
> Jeremy
>
> On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma <markus.jelsma@openindex.io
>wrote:
>
>> you can either:
>>
>> 1. run on hadoop
>> 2. not run multiple concurrent jobs on a local machine
>> 3. set a hadoop.tmp.dir per job
>> 4. merge all crawls to a single crawl
>>
>>
>> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
>> jeremyvillalobos@gmail.com> wrote:
>>
>>> Hello:
>>>
>>> I am running multiple small crawls on one machine.  I notice that they
are
>>> conflicting because they all access
>>>
>>> /tmp/hadoop-username/mapred
>>>
>>> How do I change the location of this folder ?
>>>
>>> Do I have use hadoop to run multiple crawlers each specific to a site ?
>>>
>>> thanks
>>>
>>> Jeremy
>>>
>>
>> --
>> Markus Jelsma - CTO - Openindex
>> http://www.linkedin.com/in/**markus17<http://www.linkedin.com/in/markus17
>
>> 050-8536600 / 06-50258350
>>
>

Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Posted by Jeremy Villalobos <je...@gmail.com>.
That is what I was looking for, thank you.

this property was added to:
$NUCHT_DIR/runtime/local/conf/nutch-site.xml

Jeremy

On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma <ma...@openindex.io>wrote:

> you can either:
>
> 1. run on hadoop
> 2. not run multiple concurrent jobs on a local machine
> 3. set a hadoop.tmp.dir per job
> 4. merge all crawls to a single crawl
>
>
> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
> jeremyvillalobos@gmail.com> wrote:
>
>> Hello:
>>
>> I am running multiple small crawls on one machine.  I notice that they are
>> conflicting because they all access
>>
>> /tmp/hadoop-username/mapred
>>
>> How do I change the location of this folder ?
>>
>> Do I have use hadoop to run multiple crawlers each specific to a site ?
>>
>> thanks
>>
>> Jeremy
>>
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/**markus17<http://www.linkedin.com/in/markus17>
> 050-8536600 / 06-50258350
>

Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Posted by Markus Jelsma <ma...@openindex.io>.
 you can either:

 1. run on hadoop
 2. not run multiple concurrent jobs on a local machine
 3. set a hadoop.tmp.dir per job
 4. merge all crawls to a single crawl

 On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos 
 <je...@gmail.com> wrote:
> Hello:
>
> I am running multiple small crawls on one machine.  I notice that 
> they are
> conflicting because they all access
>
> /tmp/hadoop-username/mapred
>
> How do I change the location of this folder ?
>
> Do I have use hadoop to run multiple crawlers each specific to a site 
> ?
>
> thanks
>
> Jeremy

-- 
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350