You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Reza Harditya <ha...@gmail.com> on 2007/05/14 01:41:08 UTC

Nutch Crawling error

Hi,

I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to
start crawling according to the tutorial, I always get the following error:

Injector: starting
Injector: crawlDb: crawl2/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
------------------------------------------------------------------------------------------------------------

>From the log, I found a more detailed description which is:

2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
crawl2/crawldb
2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting injected
urls to crawl db entries.
2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: dhcppc0
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
:76)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
:89)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
LocalJobRunner.java:91)
Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
:73)
        ... 3 more


At first I suspect that the error was caused by tomcat not running properly,
but after doing some checking I am confirmed that tomcat is indeed running.

Could somebody let me know what I might be doing wrong here?

Cheers,

Re: Nutch Crawling error

Posted by Doğacan Güney <do...@gmail.com>.
Hi,

On 5/15/07, Reza Harditya <ha...@gmail.com> wrote:
> Thanks Dennis, Worked like a charm :)
>
> Forgive me for running in tangent in this thread here, but I just don't
> understand from which crawl directory does the search engine fetch the
> search result from?
>
> I mean, let's say I ran the crawl from the root of Nutch installation and
> put the crawl result in a directory called 'my.crawl'. And I know that the
> search engine itself is fetching the search result from the 'crawl'
> directory under webapps when using the web interface. So how does the
> content of 'my.crawl' gets copied to 'crawl'? Do I have to do it manually
> for every crawl?

Check "searcher.dir" configuration setting. Your webapp reads this
setting and fetches results from this directory. If it is a relative
path, then it is relative to where you started your webapp.

>
> Reza
>
>
> On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
> >
> > It should look like this but change out domain for your domain.  Try
> > this and let me know if it works.
> >
> > 127.0.0.1               dhcppc0.domain.com dhcppc0
> > localhost.localdomain localhost
> >
> > Dennis Kubes
> >
> >
>


-- 
Doğacan Güney

Re: Nutch Crawling error

Posted by Reza Harditya <ha...@gmail.com>.
Thanks Dennis, Worked like a charm :)

Forgive me for running in tangent in this thread here, but I just don't
understand from which crawl directory does the search engine fetch the
search result from?

I mean, let's say I ran the crawl from the root of Nutch installation and
put the crawl result in a directory called 'my.crawl'. And I know that the
search engine itself is fetching the search result from the 'crawl'
directory under webapps when using the web interface. So how does the
content of 'my.crawl' gets copied to 'crawl'? Do I have to do it manually
for every crawl?

Reza


On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
>
> It should look like this but change out domain for your domain.  Try
> this and let me know if it works.
>
> 127.0.0.1               dhcppc0.domain.com dhcppc0
> localhost.localdomain localhost
>
> Dennis Kubes
>
>

Re: Nutch Crawling error

Posted by Dennis Kubes <nu...@dragonflymc.com>.
It should look like this but change out domain for your domain.  Try 
this and let me know if it works.

127.0.0.1               dhcppc0.domain.com dhcppc0
localhost.localdomain localhost

Dennis Kubes

Reza Harditya wrote:
> Hi Dennis,
> 
> Yes dhcppc0 is the machine that Nutch is on. And yes it is already pointing
> to 127.0.0.1.
> And my hosts file is already looking like this:
> 127.0.0.1       loacalhost.localdomain  localhost
> 
> However, I don't quite follow what you mean with "127.0.0.1
> yourhost.domain.com yourhost
> localhost.localdomain localhost". What should I put in yourhost.domain.com?
> Is it dhcppc0?
> 
> Cheers,
> 
> Reza
> 
> 
> On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
>>
>> If dhcppc0 is the host that you are on you might want to check that your
>> hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0
>> is also pointing to 127.0.0.1.  Something like this.
>>
>> 127.0.0.1               yourhost.domain.com yourhost
>> localhost.localdomain localhost
>>
>> Dennis Kubes
>>
>> Reza Harditya wrote:
>> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>> >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> SequenceFile.java
>> > :73)
>> >
>> > Could it be that it is because I have an installation of apache and
>> tomcat
>> > in the host that I've installed Nutch and it cannot determine whether
>> > 'localhost' points to the apache or tomcat? Or does it matter anyway?
>> >
>> > I have both servers(apache and tomcat) listening on the default port#
>> which
>> > is 80 and 8080.
>> >
>> >
>> >
>> >
>> > On 5/14/07, Reza Harditya <ha...@gmail.com> wrote:
>> >>
>> >> I have checked and confirmed that the hosts I'm trying to fetch are
>> >> actually accessible (ping requests and loading the site itself).
>> >> However, I
>> >> still get the same error.
>> >>
>> >> Any other alternatives?
>> >>
>> >>
>> >> On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
>> >> >
>> >> > For some reason the nutch process can't resolve the hosts.  This
>> could
>> >> > be due to incorrect setup of dns on the machine or a firewall or
>> proxy
>> >> > in place.  See if you can ping one of the urls (hosts) that you are
>> >> > trying to fetch.
>> >> >
>> >> > Dennis Kubes
>> >> >
>> >> > Reza Harditya wrote:
>> >> > > Hi,
>> >> > >
>> >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I
>> wanted
>> >> > to
>> >> > > start crawling according to the tutorial, I always get the
>> following
>> >> > error:
>> >> > >
>> >> > > Injector: starting
>> >> > > Injector: crawlDb: crawl2/crawldb
>> >> > > Injector: urlDir: urls
>> >> > > Injector: Converting injected urls to crawl db entries.
>> >> > > Exception in thread "main" java.io.IOException : Job failed!
>> >> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
>> >> > :357)
>> >> > >        at 
>> org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>> >> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
>> >> > >
>> >> >
>> >>
>> ------------------------------------------------------------------------------------------------------------ 
>>
>> >>
>> >> > >
>> >> > >
>> >> > >  From the log, I found a more detailed description which is:
>> >> > >
>> >> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
>> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
>> >> > > crawl2/crawldb
>> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir:
>> urls
>> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: 
>> Converting
>> >> > > injected
>> >> > > urls to crawl db entries.
>> >> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
>> >> > > java.lang.RuntimeException: java.net.UnknownHostException: 
>> dhcppc0:
>> >> > dhcppc0
>> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> >> > SequenceFile.java
>> >> > > :76)
>> >> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
>> >> > SequenceFile.java
>> >> > > :89)
>> >> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
>> >> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>> >> > > LocalJobRunner.java:91)
>> >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>> >> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> >> > SequenceFile.java
>> >> > > :73)
>> >> > >        ... 3 more
>> >> > >
>> >> > >
>> >> > > At first I suspect that the error was caused by tomcat not running
>> >> > > properly,
>> >> > > but after doing some checking I am confirmed that tomcat is indeed
>> >> > running.
>> >> > >
>> >> > > Could somebody let me know what I might be doing wrong here?
>> >> > >
>> >> > > Cheers,
>> >> > >
>> >> >
>> >>
>> >>
>> >
>>
> 

Re: Nutch Crawling error

Posted by Reza Harditya <ha...@gmail.com>.
Hi Dennis,

Yes dhcppc0 is the machine that Nutch is on. And yes it is already pointing
to 127.0.0.1.
And my hosts file is already looking like this:
127.0.0.1       loacalhost.localdomain  localhost

However, I don't quite follow what you mean with "127.0.0.1
yourhost.domain.com yourhost
localhost.localdomain localhost". What should I put in yourhost.domain.com?
Is it dhcppc0?

Cheers,

Reza


On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
>
> If dhcppc0 is the host that you are on you might want to check that your
> hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0
> is also pointing to 127.0.0.1.  Something like this.
>
> 127.0.0.1               yourhost.domain.com yourhost
> localhost.localdomain localhost
>
> Dennis Kubes
>
> Reza Harditya wrote:
> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :73)
> >
> > Could it be that it is because I have an installation of apache and
> tomcat
> > in the host that I've installed Nutch and it cannot determine whether
> > 'localhost' points to the apache or tomcat? Or does it matter anyway?
> >
> > I have both servers(apache and tomcat) listening on the default port#
> which
> > is 80 and 8080.
> >
> >
> >
> >
> > On 5/14/07, Reza Harditya <ha...@gmail.com> wrote:
> >>
> >> I have checked and confirmed that the hosts I'm trying to fetch are
> >> actually accessible (ping requests and loading the site itself).
> >> However, I
> >> still get the same error.
> >>
> >> Any other alternatives?
> >>
> >>
> >> On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
> >> >
> >> > For some reason the nutch process can't resolve the hosts.  This
> could
> >> > be due to incorrect setup of dns on the machine or a firewall or
> proxy
> >> > in place.  See if you can ping one of the urls (hosts) that you are
> >> > trying to fetch.
> >> >
> >> > Dennis Kubes
> >> >
> >> > Reza Harditya wrote:
> >> > > Hi,
> >> > >
> >> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I
> wanted
> >> > to
> >> > > start crawling according to the tutorial, I always get the
> following
> >> > error:
> >> > >
> >> > > Injector: starting
> >> > > Injector: crawlDb: crawl2/crawldb
> >> > > Injector: urlDir: urls
> >> > > Injector: Converting injected urls to crawl db entries.
> >> > > Exception in thread "main" java.io.IOException : Job failed!
> >> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
> >> > :357)
> >> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> >> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
> >> > >
> >> >
> >>
> ------------------------------------------------------------------------------------------------------------
> >>
> >> > >
> >> > >
> >> > >  From the log, I found a more detailed description which is:
> >> > >
> >> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> >> > > crawl2/crawldb
> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir:
> urls
> >> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> >> > > injected
> >> > > urls to crawl db entries.
> >> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> >> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
> >> > dhcppc0
> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> >> > SequenceFile.java
> >> > > :76)
> >> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
> >> > SequenceFile.java
> >> > > :89)
> >> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
> >> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> >> > > LocalJobRunner.java:91)
> >> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> >> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> >> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> >> > SequenceFile.java
> >> > > :73)
> >> > >        ... 3 more
> >> > >
> >> > >
> >> > > At first I suspect that the error was caused by tomcat not running
> >> > > properly,
> >> > > but after doing some checking I am confirmed that tomcat is indeed
> >> > running.
> >> > >
> >> > > Could somebody let me know what I might be doing wrong here?
> >> > >
> >> > > Cheers,
> >> > >
> >> >
> >>
> >>
> >
>

Re: Nutch Crawling error

Posted by Dennis Kubes <nu...@dragonflymc.com>.
If dhcppc0 is the host that you are on you might want to check that your 
hosts file has the localhost line pointing to 127.0.0.1 and that dhcppc0 
is also pointing to 127.0.0.1.  Something like this.

127.0.0.1               yourhost.domain.com yourhost 
localhost.localdomain localhost

Dennis Kubes

Reza Harditya wrote:
> Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :73)
> 
> Could it be that it is because I have an installation of apache and tomcat
> in the host that I've installed Nutch and it cannot determine whether
> 'localhost' points to the apache or tomcat? Or does it matter anyway?
> 
> I have both servers(apache and tomcat) listening on the default port# which
> is 80 and 8080.
> 
> 
> 
> 
> On 5/14/07, Reza Harditya <ha...@gmail.com> wrote:
>>
>> I have checked and confirmed that the hosts I'm trying to fetch are
>> actually accessible (ping requests and loading the site itself). 
>> However, I
>> still get the same error.
>>
>> Any other alternatives?
>>
>>
>> On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
>> >
>> > For some reason the nutch process can't resolve the hosts.  This could
>> > be due to incorrect setup of dns on the machine or a firewall or proxy
>> > in place.  See if you can ping one of the urls (hosts) that you are
>> > trying to fetch.
>> >
>> > Dennis Kubes
>> >
>> > Reza Harditya wrote:
>> > > Hi,
>> > >
>> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted
>> > to
>> > > start crawling according to the tutorial, I always get the following
>> > error:
>> > >
>> > > Injector: starting
>> > > Injector: crawlDb: crawl2/crawldb
>> > > Injector: urlDir: urls
>> > > Injector: Converting injected urls to crawl db entries.
>> > > Exception in thread "main" java.io.IOException : Job failed!
>> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
>> > :357)
>> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
>> > >
>> > 
>> ------------------------------------------------------------------------------------------------------------ 
>>
>> > >
>> > >
>> > >  From the log, I found a more detailed description which is:
>> > >
>> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
>> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
>> > > crawl2/crawldb
>> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
>> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
>> > > injected
>> > > urls to crawl db entries.
>> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
>> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
>> > dhcppc0
>> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> > SequenceFile.java
>> > > :76)
>> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
>> > SequenceFile.java
>> > > :89)
>> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
>> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>> > > LocalJobRunner.java:91)
>> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
>> > SequenceFile.java
>> > > :73)
>> > >        ... 3 more
>> > >
>> > >
>> > > At first I suspect that the error was caused by tomcat not running
>> > > properly,
>> > > but after doing some checking I am confirmed that tomcat is indeed
>> > running.
>> > >
>> > > Could somebody let me know what I might be doing wrong here?
>> > >
>> > > Cheers,
>> > >
>> >
>>
>>
> 

Re: Nutch Crawling error

Posted by Reza Harditya <ha...@gmail.com>.
Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
 :73)

Could it be that it is because I have an installation of apache and tomcat
in the host that I've installed Nutch and it cannot determine whether
'localhost' points to the apache or tomcat? Or does it matter anyway?

I have both servers(apache and tomcat) listening on the default port# which
is 80 and 8080.




On 5/14/07, Reza Harditya <ha...@gmail.com> wrote:
>
> I have checked and confirmed that the hosts I'm trying to fetch are
> actually accessible (ping requests and loading the site itself). However, I
> still get the same error.
>
> Any other alternatives?
>
>
> On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
> >
> > For some reason the nutch process can't resolve the hosts.  This could
> > be due to incorrect setup of dns on the machine or a firewall or proxy
> > in place.  See if you can ping one of the urls (hosts) that you are
> > trying to fetch.
> >
> > Dennis Kubes
> >
> > Reza Harditya wrote:
> > > Hi,
> > >
> > > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted
> > to
> > > start crawling according to the tutorial, I always get the following
> > error:
> > >
> > > Injector: starting
> > > Injector: crawlDb: crawl2/crawldb
> > > Injector: urlDir: urls
> > > Injector: Converting injected urls to crawl db entries.
> > > Exception in thread "main" java.io.IOException : Job failed!
> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java
> > :357)
> > >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> > >        at org.apache.nutch.crawl.Crawl.main (Crawl.java:105)
> > >
> > ------------------------------------------------------------------------------------------------------------
> > >
> > >
> > >  From the log, I found a more detailed description which is:
> > >
> > > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> > > crawl2/crawldb
> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
> > > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> > > injected
> > > urls to crawl db entries.
> > > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> > > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
> > dhcppc0
> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> > SequenceFile.java
> > > :76)
> > >        at org.apache.hadoop.io.SequenceFile$Writer .<init>(
> > SequenceFile.java
> > > :89)
> > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
> > >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> > > LocalJobRunner.java:91)
> > > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> > >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> > >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> > SequenceFile.java
> > > :73)
> > >        ... 3 more
> > >
> > >
> > > At first I suspect that the error was caused by tomcat not running
> > > properly,
> > > but after doing some checking I am confirmed that tomcat is indeed
> > running.
> > >
> > > Could somebody let me know what I might be doing wrong here?
> > >
> > > Cheers,
> > >
> >
>
>

Re: Nutch Crawling error

Posted by Reza Harditya <ha...@gmail.com>.
I have checked and confirmed that the hosts I'm trying to fetch are actually
accessible (ping requests and loading the site itself). However, I still get
the same error.

Any other alternatives?


On 5/14/07, Dennis Kubes <nu...@dragonflymc.com> wrote:
>
> For some reason the nutch process can't resolve the hosts.  This could
> be due to incorrect setup of dns on the machine or a firewall or proxy
> in place.  See if you can ping one of the urls (hosts) that you are
> trying to fetch.
>
> Dennis Kubes
>
> Reza Harditya wrote:
> > Hi,
> >
> > I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to
> > start crawling according to the tutorial, I always get the following
> error:
> >
> > Injector: starting
> > Injector: crawlDb: crawl2/crawldb
> > Injector: urlDir: urls
> > Injector: Converting injected urls to crawl db entries.
> > Exception in thread "main" java.io.IOException: Job failed!
> >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
> >        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
> >        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> >
> ------------------------------------------------------------------------------------------------------------
> >
> >
> >  From the log, I found a more detailed description which is:
> >
> > 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> > crawl2/crawldb
> > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
> > 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting
> > injected
> > urls to crawl db entries.
> > 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> > java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0:
> dhcppc0
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :76)
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :89)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
> >        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> > LocalJobRunner.java:91)
> > Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
> >        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
> >        at org.apache.hadoop.io.SequenceFile$Writer.<init>(
> SequenceFile.java
> > :73)
> >        ... 3 more
> >
> >
> > At first I suspect that the error was caused by tomcat not running
> > properly,
> > but after doing some checking I am confirmed that tomcat is indeed
> running.
> >
> > Could somebody let me know what I might be doing wrong here?
> >
> > Cheers,
> >
>

Re: Nutch Crawling error

Posted by Dennis Kubes <nu...@dragonflymc.com>.
For some reason the nutch process can't resolve the hosts.  This could 
be due to incorrect setup of dns on the machine or a firewall or proxy 
in place.  See if you can ping one of the urls (hosts) that you are 
trying to fetch.

Dennis Kubes

Reza Harditya wrote:
> Hi,
> 
> I'm a new nutch user. Currently I'm using Nutch 0.8.1. When I wanted to
> start crawling according to the tutorial, I always get the following error:
> 
> Injector: starting
> Injector: crawlDb: crawl2/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)
> ------------------------------------------------------------------------------------------------------------ 
> 
> 
>  From the log, I found a more detailed description which is:
> 
> 2007-05-14 09:32:57,977 INFO  crawl.Injector - Injector: starting
> 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: crawlDb:
> crawl2/crawldb
> 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: urlDir: urls
> 2007-05-14 09:32:57,978 INFO  crawl.Injector - Injector: Converting 
> injected
> urls to crawl db entries.
> 2007-05-14 09:32:58,908 WARN  mapred.LocalJobRunner - job_lzlk81
> java.lang.RuntimeException: java.net.UnknownHostException: dhcppc0: dhcppc0
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :76)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :89)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:77)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> LocalJobRunner.java:91)
> Caused by: java.net.UnknownHostException: dhcppc0: dhcppc0
>        at java.net.InetAddress.getLocalHost(InetAddress.java:1308)
>        at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java
> :73)
>        ... 3 more
> 
> 
> At first I suspect that the error was caused by tomcat not running 
> properly,
> but after doing some checking I am confirmed that tomcat is indeed running.
> 
> Could somebody let me know what I might be doing wrong here?
> 
> Cheers,
>