You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Sourajit Basak <so...@gmail.com> on 2012/12/27 13:22:27 UTC
code changes not reflecting when deployed on hadoop
We have made some changes to Fetcher (v1.5). However, when we build a .job
(jar) and deploy it on hadoop it doesn't seem to pick up any changes. This
is how we are running it.
>> ./hadoop jar ../nutch/apache-nutch-1.5.1.job
org.apache.nutch.fetcher.Fetcher <segment on hdfs> -threads 4
However, if we modify any of the plugins, it picks up the changes properly.
Initially, I doubted that our logic wasn't getting hit. To cross check, we
removed Fetcher.class from the .job file and re-executed. Still it seems to
run an old version of the code.
I strongly suspect, I am missing out something which needs to be done after
a code change.
Re: code changes not reflecting when deployed on hadoop
Posted by Ferdy Galema <fe...@kalooga.com>.
For the record: This no longer seems to be the case for trunk. (At least
when you properly ant clean prior to building).
On Fri, Dec 28, 2012 at 12:25 PM, Sourajit Basak
<so...@gmail.com>wrote:
> Turns out that apache-nutch*.jar was packed inside the jobfile's 'lib'
> directory along with the classes. And hadoop picked the Fetcher class from
> the jar inside 'lib'.
>
>
>
> On Thu, Dec 27, 2012 at 11:46 PM, Sourajit Basak
> <so...@gmail.com>wrote:
>
> > Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the
> > distributed cache.
> > Will keep the thread updated.
> >
> >
> > On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak <
> sourajit.basac@gmail.com>wrote:
> >
> >> This is what I did.
> >>
> >> Our nutch directory only contains the following structure. Basically the
> >> script does what I was doing previously.
> >>
> >> apache-nutch-1.5.1.job
> >> +bin
> >> nutch
> >>
> >> Even in this case, I deleted the entire fetcher package. The fetch
> >> command worked !!!
> >>
> >> Is anyone in a position to repeat this exercise ? Maybe change a
> >> LOG.info(..) in Fetcher and see what happens ?
> >>
> >>
> >>
> >> On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <
> sourajit.basac@gmail.com
> >> > wrote:
> >>
> >>> Are you saying that I put hadoop binary on the path and use the nutch
> >>> script like on local.
> >>>
> >>>
> >>> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <
> >>> sourajit.basac@gmail.com> wrote:
> >>>
> >>>> Didn't understand.
> >>>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I
> fire
> >>>> ?
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
> >>>> markus.jelsma@openindex.io> wrote:
> >>>>
> >>>>> CWD
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
>
--
*Ferdy Galema*
Kalooga Development
--
*Kalooga* | Visual RelevanceCheck out our Visual Gallery Layer now!<http://spitsnieuws.nl/archives/entertainment/2012/12/huis-amy-winehouse-levert-weinig-op>
Kalooga
Helperpark 288
9723 ZA Groningen
The Netherlands
+31 50 2103400
www.kalooga.com
info@kalooga.comKalooga EMEA
53 Davies Street
W1K 5JH London
United Kingdom
+44 20 7129 1430Kalooga Spain and LatAM
Maria de Sevilla Diago No 3
28022 Madrid - Madrid
Spain
+34 670 580 872
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
Turns out that apache-nutch*.jar was packed inside the jobfile's 'lib'
directory along with the classes. And hadoop picked the Fetcher class from
the jar inside 'lib'.
On Thu, Dec 27, 2012 at 11:46 PM, Sourajit Basak
<so...@gmail.com>wrote:
> Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the
> distributed cache.
> Will keep the thread updated.
>
>
> On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak <so...@gmail.com>wrote:
>
>> This is what I did.
>>
>> Our nutch directory only contains the following structure. Basically the
>> script does what I was doing previously.
>>
>> apache-nutch-1.5.1.job
>> +bin
>> nutch
>>
>> Even in this case, I deleted the entire fetcher package. The fetch
>> command worked !!!
>>
>> Is anyone in a position to repeat this exercise ? Maybe change a
>> LOG.info(..) in Fetcher and see what happens ?
>>
>>
>>
>> On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <sourajit.basac@gmail.com
>> > wrote:
>>
>>> Are you saying that I put hadoop binary on the path and use the nutch
>>> script like on local.
>>>
>>>
>>> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <
>>> sourajit.basac@gmail.com> wrote:
>>>
>>>> Didn't understand.
>>>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire
>>>> ?
>>>>
>>>>
>>>>
>>>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
>>>> markus.jelsma@openindex.io> wrote:
>>>>
>>>>> CWD
>>>>
>>>>
>>>>
>>>
>>
>
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
Maybe on hadoop 1.1, any job submitted via ToolRunner is stored in the
distributed cache.
Will keep the thread updated.
On Thu, Dec 27, 2012 at 8:24 PM, Sourajit Basak <so...@gmail.com>wrote:
> This is what I did.
>
> Our nutch directory only contains the following structure. Basically the
> script does what I was doing previously.
>
> apache-nutch-1.5.1.job
> +bin
> nutch
>
> Even in this case, I deleted the entire fetcher package. The fetch command
> worked !!!
>
> Is anyone in a position to repeat this exercise ? Maybe change a
> LOG.info(..) in Fetcher and see what happens ?
>
>
>
> On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <so...@gmail.com>wrote:
>
>> Are you saying that I put hadoop binary on the path and use the nutch
>> script like on local.
>>
>>
>> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <sourajit.basac@gmail.com
>> > wrote:
>>
>>> Didn't understand.
>>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
>>>
>>>
>>>
>>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
>>> markus.jelsma@openindex.io> wrote:
>>>
>>>> CWD
>>>
>>>
>>>
>>
>
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
This is what I did.
Our nutch directory only contains the following structure. Basically the
script does what I was doing previously.
apache-nutch-1.5.1.job
+bin
nutch
Even in this case, I deleted the entire fetcher package. The fetch command
worked !!!
Is anyone in a position to repeat this exercise ? Maybe change a
LOG.info(..) in Fetcher and see what happens ?
On Thu, Dec 27, 2012 at 7:42 PM, Sourajit Basak <so...@gmail.com>wrote:
> Are you saying that I put hadoop binary on the path and use the nutch
> script like on local.
>
>
> On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <so...@gmail.com>wrote:
>
>> Didn't understand.
>> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
>>
>>
>>
>> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <
>> markus.jelsma@openindex.io> wrote:
>>
>>> CWD
>>
>>
>>
>
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
Are you saying that I put hadoop binary on the path and use the nutch
script like on local.
On Thu, Dec 27, 2012 at 7:35 PM, Sourajit Basak <so...@gmail.com>wrote:
> Didn't understand.
> Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
>
>
>
> On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> CWD
>
>
>
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
Didn't understand.
Lets say I put the job file in HADOOP_HOME/bin. What commands do I fire ?
On Thu, Dec 27, 2012 at 7:27 PM, Markus Jelsma
<ma...@openindex.io>wrote:
> CWD
RE: code changes not reflecting when deployed on hadoop
Posted by Markus Jelsma <ma...@openindex.io>.
It works the same as in local mode, just have the job file in the CWD.
-----Original message-----
> From:Sourajit Basak <so...@gmail.com>
> Sent: Thu 27-Dec-2012 14:51
> To: user@nutch.apache.org
> Subject: Re: code changes not reflecting when deployed on hadoop
>
> We are using hadoop 1.1
>
> On Thu, Dec 27, 2012 at 7:13 PM, Sourajit Basak <so...@gmail.com>wrote:
>
> > How do you use the nutch script on a cluster ?
> >
> >
> > On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma <markus.jelsma@openindex.io
> > > wrote:
> >
> >> Can you try using the nutch script to run your fetcher?
> >
> >
> >
>
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
We are using hadoop 1.1
On Thu, Dec 27, 2012 at 7:13 PM, Sourajit Basak <so...@gmail.com>wrote:
> How do you use the nutch script on a cluster ?
>
>
> On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma <markus.jelsma@openindex.io
> > wrote:
>
>> Can you try using the nutch script to run your fetcher?
>
>
>
Re: code changes not reflecting when deployed on hadoop
Posted by Sourajit Basak <so...@gmail.com>.
How do you use the nutch script on a cluster ?
On Thu, Dec 27, 2012 at 6:25 PM, Markus Jelsma
<ma...@openindex.io>wrote:
> Can you try using the nutch script to run your fetcher?
RE: code changes not reflecting when deployed on hadoop
Posted by Markus Jelsma <ma...@openindex.io>.
Seems the job file is not deployed to all task trackers and i'm not sure why. Can you try using the nutch script to run your fetcher?
-----Original message-----
> From:Sourajit Basak <so...@gmail.com>
> Sent: Thu 27-Dec-2012 13:29
> To: user@nutch.apache.org
> Subject: code changes not reflecting when deployed on hadoop
>
> We have made some changes to Fetcher (v1.5). However, when we build a .job
> (jar) and deploy it on hadoop it doesn't seem to pick up any changes. This
> is how we are running it.
>
> >> ./hadoop jar ../nutch/apache-nutch-1.5.1.job
> org.apache.nutch.fetcher.Fetcher <segment on hdfs> -threads 4
>
> However, if we modify any of the plugins, it picks up the changes properly.
>
> Initially, I doubted that our logic wasn't getting hit. To cross check, we
> removed Fetcher.class from the .job file and re-executed. Still it seems to
> run an old version of the code.
>
> I strongly suspect, I am missing out something which needs to be done after
> a code change.
>