You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jitendra rajput <je...@gmail.com> on 2010/09/15 19:55:54 UTC

Hadoop log not getting generated on ec2.

Hi,

Hadoop log file is not getting generated when I run nutch job jar on hadoop
EC2 cluster. I am starting job with following command.

hadoop jar -DHadoop.log.dir=logs -Dhadoop.log.file=hadoop.log  <jar
name>.jar urls -dir crawl -depth 1 -topN 10

Could any one point out, whats going wrong here.

-- 
Thanks and regards

Jitendra Singh

Re: Hadoop log not getting generated on ec2.

Posted by Jitendra <je...@gmail.com>.
Hi Ken,

You are God among men!!!
Thanks for the reply. We could naill down the problem to a missing jar for a
plugin and it was a runtime ClassNotFoundException. However, it did not
notify any errors at the master node and completed successfully.
So we need to find out a way to catch such errors.
Anyway, thanks again.

-Nayn/Jitendra

On Thu, Sep 16, 2010 at 7:38 PM, kkrugler [via Lucene] <
ml-node+1488089-2109761505-187975@n3.nabble.com<ml...@n3.nabble.com>
> wrote:

> Hi Jitendra,
>
> > Thanks a lot for reply.
> > Actually I am quite new to nutch. Could you please post sample
> > mapred-site.xml for it.
> > My motive is to generate logs and see where my job is going wrong.
>
> A typical location for Hadoop logs in EC2 is at /mnt/hadoop/logs/
> userlogs/<task-name>
>
> Inside of this directory, the syslog has output from calls made to the
> Log4J logging subsystem.
>
> To get onto one of the slaves, you'd ssh onto the master using the
> public name, then ssh root@<private slave name>
>
> -- Ken
>
>
> > On Thu, Sep 16, 2010 at 6:56 PM, Andrzej Bialecki [via Lucene] <
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=1488089&i=0><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=1488089&i=1>
> > >
> >> wrote:
> >
> >> On 2010-09-16 15:10, Jitendra wrote:
> >>
> >>>
> >>> Hi,
> >>> Could any one please help me here. My Job is not running properly
> >>> and
> >>> because of absence of log files, I am not able to nail it down.
> >>>
> >>> Any help would be appreciated.
> >>>
> >>> Thanks
> >>>
> >>> On Wed, Sep 15, 2010 at 11:26 PM, Jitendra [via Lucene]<
> >>> [hidden email] <http://user/SendEmail.jtp?
> >>> type=node&node=1487948&i=0><[hidden
> >> email] <http://user/SendEmail.jtp?type=node&node=1487948&i=1>>
> >>>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> Hadoop log file is not getting generated when I run nutch job jar
> >>>> on
> >> hadoop
> >>>>
> >>>> EC2 cluster. I am starting job with following command.
> >>>>
> >>>> hadoop jar -DHadoop.log.dir=logs -Dhadoop.log.file=hadoop.log<jar
> >>>> name>.jar urls -dir crawl -depth 1 -topN 10
> >>
> >> The -D flags are not passed to distributed tasks, so in effect these
> >> cmd-line switches have only meaning for the locally running process,
> >> i.e. the JarRunner. If you want these defines to be applied to all
> >> distributed tasks, then you need to add them to mapred.child.opts in
> >> conf/mapred-site.xml
> >>
> >>
> >> --
> >> Best regards,
> >> Andrzej Bialecki     <><
> >>  ___. ___ ___ ___ _ _   __________________________________
> >> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> >> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> >> http://www.sigram.com <http://www.sigram.com?by-user=t> <
> http://www.sigram.com?by-user=t<http://www.sigram.com?by-user=t&by-user=t>>
>  Contact:
> >> info at
> >> sigram dot com
> >>
> >>
> >>
> >> ------------------------------
> >> View message @
> >>
> http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1487948.html<http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1487948.html?by-user=t>
> >> To start a new topic under Nutch - User, email
> >> [hidden email] <http://user/SendEmail.jtp?type=node&node=1488089&i=2><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=1488089&i=3>
> >> >
> >> To unsubscribe from Nutch - User, click here<
> http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=603147&code=amVldC5sb3Zlc0BnbWFpbC5jb218NjAzMTQ3fC0xMDg2ODAyNDgy<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=603147&code=amVldC5sb3Zlc0BnbWFpbC5jb218NjAzMTQ3fC0xMDg2ODAyNDgy&by-user=t>
>
> >> >.
> >>
> >>
> >>
> >
> >
> > --
> > Thanks and regards
> >
> > Jitendra Singh
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1488000.html<http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1488000.html?by-user=t>
> > Sent from the Nutch - User mailing list archive at Nabble.com.
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com <http://bixolabs.com?by-user=t>
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1488089.html
> To start a new topic under Nutch - User, email
> ml-node+603147-511429585-187975@n3.nabble.com<ml...@n3.nabble.com>
> To unsubscribe from Nutch - User, click here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=603147&code=amVldC5sb3Zlc0BnbWFpbC5jb218NjAzMTQ3fC0xMDg2ODAyNDgy>.
>
>
>


-- 
Thanks and regards

Jitendra Singh

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1488482.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Hadoop log not getting generated on ec2.

Posted by Ken Krugler <kk...@transpac.com>.
Hi Jitendra,

> Thanks a lot for reply.
> Actually I am quite new to nutch. Could you please post sample
> mapred-site.xml for it.
> My motive is to generate logs and see where my job is going wrong.

A typical location for Hadoop logs in EC2 is at /mnt/hadoop/logs/ 
userlogs/<task-name>

Inside of this directory, the syslog has output from calls made to the  
Log4J logging subsystem.

To get onto one of the slaves, you'd ssh onto the master using the  
public name, then ssh root@<private slave name>

-- Ken


> On Thu, Sep 16, 2010 at 6:56 PM, Andrzej Bialecki [via Lucene] <
> ml-node+1487948-389070063-187975@n3.nabble.com<ml-node%2B1487948-389070063-187975@n3.nabble.com 
> >
>> wrote:
>
>> On 2010-09-16 15:10, Jitendra wrote:
>>
>>>
>>> Hi,
>>> Could any one please help me here. My Job is not running properly  
>>> and
>>> because of absence of log files, I am not able to nail it down.
>>>
>>> Any help would be appreciated.
>>>
>>> Thanks
>>>
>>> On Wed, Sep 15, 2010 at 11:26 PM, Jitendra [via Lucene]<
>>> [hidden email] <http://user/SendEmail.jtp? 
>>> type=node&node=1487948&i=0><[hidden
>> email] <http://user/SendEmail.jtp?type=node&node=1487948&i=1>>
>>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Hadoop log file is not getting generated when I run nutch job jar  
>>>> on
>> hadoop
>>>>
>>>> EC2 cluster. I am starting job with following command.
>>>>
>>>> hadoop jar -DHadoop.log.dir=logs -Dhadoop.log.file=hadoop.log<jar
>>>> name>.jar urls -dir crawl -depth 1 -topN 10
>>
>> The -D flags are not passed to distributed tasks, so in effect these
>> cmd-line switches have only meaning for the locally running process,
>> i.e. the JarRunner. If you want these defines to be applied to all
>> distributed tasks, then you need to add them to mapred.child.opts in
>> conf/mapred-site.xml
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com <http://www.sigram.com?by-user=t>  Contact:  
>> info at
>> sigram dot com
>>
>>
>>
>> ------------------------------
>> View message @
>> http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1487948.html
>> To start a new topic under Nutch - User, email
>> ml-node+603147-511429585-187975@n3.nabble.com<ml-node%2B603147-511429585-187975@n3.nabble.com 
>> >
>> To unsubscribe from Nutch - User, click here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=603147&code=amVldC5sb3Zlc0BnbWFpbC5jb218NjAzMTQ3fC0xMDg2ODAyNDgy 
>> >.
>>
>>
>>
>
>
> -- 
> Thanks and regards
>
> Jitendra Singh
>
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1488000.html
> Sent from the Nutch - User mailing list archive at Nabble.com.

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






Re: Hadoop log not getting generated on ec2.

Posted by Jitendra <je...@gmail.com>.
Thanks a lot for reply.
Actually I am quite new to nutch. Could you please post sample
mapred-site.xml for it.
My motive is to generate logs and see where my job is going wrong.

Thanks

On Thu, Sep 16, 2010 at 6:56 PM, Andrzej Bialecki [via Lucene] <
ml-node+1487948-389070063-187975@n3.nabble.com<ml...@n3.nabble.com>
> wrote:

> On 2010-09-16 15:10, Jitendra wrote:
>
> >
> > Hi,
> > Could any one please help me here. My Job is not running properly and
> > because of absence of log files, I am not able to nail it down.
> >
> > Any help would be appreciated.
> >
> > Thanks
> >
> > On Wed, Sep 15, 2010 at 11:26 PM, Jitendra [via Lucene]<
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=1487948&i=0><[hidden
> email] <http://user/SendEmail.jtp?type=node&node=1487948&i=1>>
> >> wrote:
> >
> >> Hi,
> >>
> >> Hadoop log file is not getting generated when I run nutch job jar on
> hadoop
> >>
> >> EC2 cluster. I am starting job with following command.
> >>
> >> hadoop jar -DHadoop.log.dir=logs -Dhadoop.log.file=hadoop.log<jar
> >> name>.jar urls -dir crawl -depth 1 -topN 10
>
> The -D flags are not passed to distributed tasks, so in effect these
> cmd-line switches have only meaning for the locally running process,
> i.e. the JarRunner. If you want these defines to be applied to all
> distributed tasks, then you need to add them to mapred.child.opts in
> conf/mapred-site.xml
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com <http://www.sigram.com?by-user=t>  Contact: info at
> sigram dot com
>
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1487948.html
> To start a new topic under Nutch - User, email
> ml-node+603147-511429585-187975@n3.nabble.com<ml...@n3.nabble.com>
> To unsubscribe from Nutch - User, click here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=603147&code=amVldC5sb3Zlc0BnbWFpbC5jb218NjAzMTQ3fC0xMDg2ODAyNDgy>.
>
>
>


-- 
Thanks and regards

Jitendra Singh

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1488000.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Hadoop log not getting generated on ec2.

Posted by Andrzej Bialecki <ab...@getopt.org>.
On 2010-09-16 15:10, Jitendra wrote:
>
> Hi,
> Could any one please help me here. My Job is not running properly and
> because of absence of log files, I am not able to nail it down.
>
> Any help would be appreciated.
>
> Thanks
>
> On Wed, Sep 15, 2010 at 11:26 PM, Jitendra [via Lucene]<
> ml-node+1481389-1317531302-187975@n3.nabble.com<ml...@n3.nabble.com>
>> wrote:
>
>> Hi,
>>
>> Hadoop log file is not getting generated when I run nutch job jar on hadoop
>>
>> EC2 cluster. I am starting job with following command.
>>
>> hadoop jar -DHadoop.log.dir=logs -Dhadoop.log.file=hadoop.log<jar
>> name>.jar urls -dir crawl -depth 1 -topN 10

The -D flags are not passed to distributed tasks, so in effect these 
cmd-line switches have only meaning for the locally running process, 
i.e. the JarRunner. If you want these defines to be applied to all 
distributed tasks, then you need to add them to mapred.child.opts in 
conf/mapred-site.xml


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Hadoop log not getting generated on ec2.

Posted by Jitendra <je...@gmail.com>.
Hi,
Could any one please help me here. My Job is not running properly and
because of absence of log files, I am not able to nail it down.

Any help would be appreciated.

Thanks

On Wed, Sep 15, 2010 at 11:26 PM, Jitendra [via Lucene] <
ml-node+1481389-1317531302-187975@n3.nabble.com<ml...@n3.nabble.com>
> wrote:

> Hi,
>
> Hadoop log file is not getting generated when I run nutch job jar on hadoop
>
> EC2 cluster. I am starting job with following command.
>
> hadoop jar -DHadoop.log.dir=logs -Dhadoop.log.file=hadoop.log  <jar
> name>.jar urls -dir crawl -depth 1 -topN 10
>
> Could any one point out, whats going wrong here.
>
> --
> Thanks and regards
>
> Jitendra Singh
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1481389.html
> To start a new topic under Nutch - User, email
> ml-node+603147-511429585-187975@n3.nabble.com<ml...@n3.nabble.com>
> To unsubscribe from Nutch - User, click here<http://lucene.472066.n3.nabble.com/template/TplServlet.jtp?tpl=unsubscribe_by_code&node=603147&code=amVldC5sb3Zlc0BnbWFpbC5jb218NjAzMTQ3fC0xMDg2ODAyNDgy>.
>
>
>


-- 
Thanks and regards

Jitendra Singh

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Hadoop-log-not-getting-generated-on-ec2-tp1481389p1487911.html
Sent from the Nutch - User mailing list archive at Nabble.com.