You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sourajit Basak <so...@gmail.com> on 2012/12/27 13:22:27 UTC

code changes not reflecting when deployed on hadoop

We have made some changes to Fetcher (v1.5). However, when we build a .job
(jar) and deploy it on hadoop it doesn't seem to pick up any changes. This
is how we are running it.

>> ./hadoop jar ../nutch/apache-nutch-1.5.1.job
org.apache.nutch.fetcher.Fetcher <segment on hdfs> -threads 4

However, if we modify any of the plugins, it picks up the changes properly.

Initially, I doubted that our logic wasn't getting hit. To cross check, we
removed Fetcher.class from the .job file and re-executed. Still it seems to
run an old version of the code.

I strongly suspect, I am missing out something which needs to be done after
a code change.

Re: code changes not reflecting when deployed on hadoop

Posted by Ferdy Galema <fe...@kalooga.com>.
For the record: This no longer seems to be the case for trunk. (At least
when you properly ant clean prior to building).


On Fri, Dec 28, 2012 at 12:25 PM, Sourajit Basak
<so...@gmail.com>wrote:

> Turns out that apache-nutch*.jar was packed inside the jobfile's 'lib'
> directory along with the classes. And hadoop picked the Fetcher class from
> the jar inside 'lib'.
>
>
>
> On Thu, Dec 27, 2012 at 11:46 PM, Sourajit Basak
> <so...@gmail.com>wrote:
>
> > Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the
> > distributed cache.
> > Will keep the thread updated.
> >
> >
> > On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak <
> sourajit.basac@gmail.com>wrote:
> >
> >> This is what I did.
> >>
> >> Our nutch directory only contains the following structure. Basically the
> >> script does what I was doing previously.
> >>
> >> apache-nutch-1.5.1.job
> >> +bin
> >>    nutch
> >>
> >> Even in this case, I deleted the entire fetcher package. The fetch
> >> command worked !!!
> >>
> >> Is anyone in a position to repeat this exercise ? Maybe change a
> >> LOG.info(..) in Fetcher and see what happens ?
> >>
> >>
> >>
> >> On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <
> sourajit.basac@gmail.com
> >> > wrote:
> >>
> >>> Are you saying that I put hadoop binary on the path and use the nutch
> >>> script like on local.
> >>>
> >>>
> >>> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <
> >>> sourajit.basac@gmail.com> wrote:
> >>>
> >>>> Didn't understand.
> >>>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I
> fire
> >>>> ?
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
> >>>> markus.jelsma@openindex.io> wrote:
> >>>>
> >>>>> CWD
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
>



-- 
*Ferdy Galema*
Kalooga Development

-- 

*Kalooga* | Visual RelevanceCheck out our Visual Gallery Layer now!<http://spitsnieuws.nl/archives/entertainment/2012/12/huis-amy-winehouse-levert-weinig-op>
Kalooga

Helperpark 288
9723 ZA Groningen
The Netherlands
+31 50 2103400

www.kalooga.com
info@kalooga.comKalooga EMEA

53 Davies Street
W1K 5JH London
United Kingdom
+44 20 7129 1430Kalooga Spain and LatAM

Maria de Sevilla Diago No 3
28022 Madrid - Madrid
Spain
+34 670 580 872



Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
Turns out that apache-nutch*.jar was packed inside the jobfile's 'lib'
directory along with the classes. And hadoop picked the Fetcher class from
the jar inside 'lib'.



On Thu, Dec 27, 2012 at 11:46 PM, Sourajit Basak
<so...@gmail.com>wrote:

> Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the
> distributed cache.
> Will keep the thread updated.
>
>
> On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak <so...@gmail.com>wrote:
>
>> This is what I did.
>>
>> Our nutch directory only contains the following structure. Basically the
>> script does what I was doing previously.
>>
>> apache-nutch-1.5.1.job
>> +bin
>>    nutch
>>
>> Even in this case, I deleted the entire fetcher package. The fetch
>> command worked !!!
>>
>> Is anyone in a position to repeat this exercise ? Maybe change a
>> LOG.info(..) in Fetcher and see what happens ?
>>
>>
>>
>> On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <sourajit.basac@gmail.com
>> > wrote:
>>
>>> Are you saying that I put hadoop binary on the path and use the nutch
>>> script like on local.
>>>
>>>
>>> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <
>>> sourajit.basac@gmail.com> wrote:
>>>
>>>> Didn't understand.
>>>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire
>>>> ?
>>>>
>>>>
>>>>
>>>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
>>>> markus.jelsma@openindex.io> wrote:
>>>>
>>>>> CWD
>>>>
>>>>
>>>>
>>>
>>
>

Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the
distributed cache.
Will keep the thread updated.

On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak <so...@gmail.com>wrote:

> This is what I did.
>
> Our nutch directory only contains the following structure. Basically the
> script does what I was doing previously.
>
> apache-nutch-1.5.1.job
> +bin
>    nutch
>
> Even in this case, I deleted the entire fetcher package. The fetch command
> worked !!!
>
> Is anyone in a position to repeat this exercise ? Maybe change a
> LOG.info(..) in Fetcher and see what happens ?
>
>
>
> On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <so...@gmail.com>wrote:
>
>> Are you saying that I put hadoop binary on the path and use the nutch
>> script like on local.
>>
>>
>> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <sourajit.basac@gmail.com
>> > wrote:
>>
>>> Didn't understand.
>>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
>>>
>>>
>>>
>>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
>>> markus.jelsma@openindex.io> wrote:
>>>
>>>> CWD
>>>
>>>
>>>
>>
>

Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
This is what I did.

Our nutch directory only contains the following structure. Basically the
script does what I was doing previously.

apache-nutch-1.5.1.job
+bin
   nutch

Even in this case, I deleted the entire fetcher package. The fetch command
worked !!!

Is anyone in a position to repeat this exercise ? Maybe change a
LOG.info(..) in Fetcher and see what happens ?


On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <so...@gmail.com>wrote:

> Are you saying that I put hadoop binary on the path and use the nutch
> script like on local.
>
>
> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <so...@gmail.com>wrote:
>
>> Didn't understand.
>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
>>
>>
>>
>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
>> markus.jelsma@openindex.io> wrote:
>>
>>> CWD
>>
>>
>>
>

Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
Are you saying that I put hadoop binary on the path and use the nutch
script like on local.

On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <so...@gmail.com>wrote:

> Didn't understand.
> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
>
>
>
> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> CWD
>
>
>

Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
Didn't understand.
Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?



On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> CWD

RE: code changes not reflecting when deployed on hadoop

Posted by Markus Jelsma <ma...@openindex.io>.
It works the same as in local mode, just have the job file in the CWD. 
 
-----Original message-----
> From:Sourajit Basak <so...@gmail.com>
> Sent: Thu 27-Dec-2012 14:51
> To: user@nutch.apache.org
> Subject: Re: code changes not reflecting when deployed on hadoop
> 
> We are using hadoop 1.1
> 
> On Thu, Dec 27, 2012 at 7:13 PM, Sourajit Basak <so...@gmail.com>wrote:
> 
> > How do you use the nutch script on a cluster ?
> >
> >
> > On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma <markus.jelsma@openindex.io
> > > wrote:
> >
> >> Can you try using the nutch script to run your fetcher?
> >
> >
> >
> 

Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
We are using hadoop 1.1

On Thu, Dec 27, 2012 at 7:13 PM, Sourajit Basak <so...@gmail.com>wrote:

> How do you use the nutch script on a cluster ?
>
>
> On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> Can you try using the nutch script to run your fetcher?
>
>
>

Re: code changes not reflecting when deployed on hadoop

Posted by Sourajit Basak <so...@gmail.com>.
How do you use the nutch script on a cluster ?

On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> Can you try using the nutch script to run your fetcher?

RE: code changes not reflecting when deployed on hadoop

Posted by Markus Jelsma <ma...@openindex.io>.
Seems the job file is not deployed to all task trackers and i'm not sure why. Can you try using the nutch script to run your fetcher? 
 
-----Original message-----
> From:Sourajit Basak <so...@gmail.com>
> Sent: Thu 27-Dec-2012 13:29
> To: user@nutch.apache.org
> Subject: code changes not reflecting when deployed on hadoop
> 
> We have made some changes to Fetcher (v1.5). However, when we build a .job
> (jar) and deploy it on hadoop it doesn't seem to pick up any changes. This
> is how we are running it.
> 
> >> ./hadoop jar ../nutch/apache-nutch-1.5.1.job
> org.apache.nutch.fetcher.Fetcher <segment on hdfs> -threads 4
> 
> However, if we modify any of the plugins, it picks up the changes properly.
> 
> Initially, I doubted that our logic wasn't getting hit. To cross check, we
> removed Fetcher.class from the .job file and re-executed. Still it seems to
> run an old version of the code.
> 
> I strongly suspect, I am missing out something which needs to be done after
> a code change.
>