You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Feng Ji <fe...@gmail.com> on 2006/08/08 03:00:20 UTC

nutch08 indexer error

Hi there,

I met an issue to run "nutch/bin index...". I checked out latest nutch from
SVN, so I am running nutch-08.

I searched the achived emails, and there is one email mentioned
that "index-basic" must be in index configuration xml, which I checked my
config and it is already included.

1.
In one case, indexing log showing:
"
Indexing [http://calendar.ufl.edu/] with analyzer
org.apache.nutch.analysis.NutchDocumentAnalyzer@53fb57 (null)
"

But, it could still finish indexing and searching is successfully after. A
bit weird thing.

2.
In the other crawling case, I indexing multiple segments and run into fatal
error as

"
Indexer: starting
Indexer: linkdb: crawl/linkdb
Indexer: adding segment: crawl/segments/20060807202736
Indexer: adding segment: crawl/segments/20060807202824
Optimizing index.
Exception in thread "main" java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
 at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
 at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
"

I wonder what cause the error?

3.
Is the downloadable nutch-08 release package more stable than version of SVN
check out?

thanks you time,

Feng Ji

Re: nutch08 indexer error

Posted by Feng Ji <fe...@gmail.com>.
I tried the nutch08 release.
http://lucene.apache.org/nutch/#25+July+2006%3A+Nutch+0.8+Released

Everything is working fine. I guess the unstability of the version checked
out from SVN is due to nutch09's on-going development.

Michael,


On 8/8/06, Feng Ji <fe...@gmail.com> wrote:
>
>  hi *Teruhiko:*
> **
> *I replace the hadoop04 with 05 version, recompile the nutch. But nutch
> gives me the error messages when I run seeds inject at very beginning.*
> **
> *Any though you will have?*
> **
> *thanks your time,*
>
> java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configured
>  at java.lang.ClassLoader.defineClass1(Native Method)
>  at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>  at java.security.SecureClassLoader.defineClass (SecureClassLoader.java
> :124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>  at sun.misc.Launcher$AppClassLoader.loadClass (Launcher.java:268)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>  at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> Exception in thread "main"
>
>
>  On 8/8/06, Teruhiko Kurosaka <Ku...@basistech.com> wrote:
> >
> > The problem #2 might be due to
> > http://issues.apache.org/jira/browse/NUTCH-266
> >
> > Download the latest Hadoop and replace the haddop-*.jar in lib with that
> > version.
> >
> > > -----Original Message-----
> > > From: Feng Ji [mailto:fengji2004@gmail.com ]
> > > Sent: 2006-8-07 18:00
> > > To: nutch-user@lucene.apache.org
> > > Subject: nutch08 indexer error
> > >
> > > Hi there,
> > >
> > > I met an issue to run "nutch/bin index...". I checked out
> > > latest nutch from
> > > SVN, so I am running nutch-08.
> > >
> > > I searched the achived emails, and there is one email mentioned
> > > that "index-basic" must be in index configuration xml, which
> > > I checked my
> > > config and it is already included.
> > >
> > > 1.
> > > In one case, indexing log showing:
> > > "
> > > Indexing [http://calendar.ufl.edu/ ] with analyzer
> > > org.apache.nutch.analysis.NutchDocumentAnalyzer@53fb57 (null)
> > > "
> > >
> > > But, it could still finish indexing and searching is
> > > successfully after. A
> > > bit weird thing.
> > >
> > > 2.
> > > In the other crawling case, I indexing multiple segments and
> > > run into fatal
> > > error as
> > >
> > > "
> > > Indexer: starting
> > > Indexer: linkdb: crawl/linkdb
> > > Indexer: adding segment: crawl/segments/20060807202736
> > > Indexer: adding segment: crawl/segments/20060807202824
> > > Optimizing index.
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >  at org.apache.hadoop.mapred.JobClient.runJob (JobClient.java:357)
> > >  at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
> > >  at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
> > > "
> > >
> > > I wonder what cause the error?
> > >
> > > 3.
> > > Is the downloadable nutch-08 release package more stable than
> > > version of SVN
> > > check out?
> > >
> > > thanks you time,
> > >
> > > Feng Ji
> > >
> >
>
>

RE: nutch08 indexer error

Posted by Teruhiko Kurosaka <Ku...@basistech.com>.
You don't need to recompile nutch.
Just move hadoop-0.4.0.jar out of nutch/lib and put hadoop-0.5.0.jar
there instead.
The error message suggests that the .jar file is not in CLASSPATH, which
the
nutch script (bin/nutch) builds by grabing every lib/*.jar. 
org/apache/hadoop/conf/Configured.class
can be found in hadoop-0.5.0.jar; at least the version I've got has this
class.


> -----Original Message-----
> From: Feng Ji [mailto:fengji2004@gmail.com] 
> Sent: 2006-8-08 16:14
> To: nutch-user@lucene.apache.org
> Subject: Re: nutch08 indexer error
> 
> hi *Teruhiko:*
> **
> *I replace the hadoop04 with 05 version, recompile the nutch. 
> But nutch
> gives me the error messages when I run seeds inject at very 
> beginning.*
> **
> *Any though you will have?*
> **
> *thanks your time,*
> 
> java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configured
>  at java.lang.ClassLoader.defineClass1(Native Method)
>  at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>  at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.
> java:124)
>  at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>  at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
>  at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
>  at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>  at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> Exception in thread "main"
> 
> 
> On 8/8/06, Teruhiko Kurosaka <Ku...@basistech.com> wrote:
> >
> > The problem #2 might be due to
> > http://issues.apache.org/jira/browse/NUTCH-266
> >
> > Download the latest Hadoop and replace the haddop-*.jar in 
> lib with that
> > version.
> >
> > > -----Original Message-----
> > > From: Feng Ji [mailto:fengji2004@gmail.com]
> > > Sent: 2006-8-07 18:00
> > > To: nutch-user@lucene.apache.org
> > > Subject: nutch08 indexer error
> > >
> > > Hi there,
> > >
> > > I met an issue to run "nutch/bin index...". I checked out
> > > latest nutch from
> > > SVN, so I am running nutch-08.
> > >
> > > I searched the achived emails, and there is one email mentioned
> > > that "index-basic" must be in index configuration xml, which
> > > I checked my
> > > config and it is already included.
> > >
> > > 1.
> > > In one case, indexing log showing:
> > > "
> > > Indexing [http://calendar.ufl.edu/] with analyzer
> > > org.apache.nutch.analysis.NutchDocumentAnalyzer@53fb57 (null)
> > > "
> > >
> > > But, it could still finish indexing and searching is
> > > successfully after. A
> > > bit weird thing.
> > >
> > > 2.
> > > In the other crawling case, I indexing multiple segments and
> > > run into fatal
> > > error as
> > >
> > > "
> > > Indexer: starting
> > > Indexer: linkdb: crawl/linkdb
> > > Indexer: adding segment: crawl/segments/20060807202736
> > > Indexer: adding segment: crawl/segments/20060807202824
> > > Optimizing index.
> > > Exception in thread "main" java.io.IOException: Job failed!
> > >  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
> > >  at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
> > >  at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
> > > "
> > >
> > > I wonder what cause the error?
> > >
> > > 3.
> > > Is the downloadable nutch-08 release package more stable than
> > > version of SVN
> > > check out?
> > >
> > > thanks you time,
> > >
> > > Feng Ji
> > >
> >
> 

Re: nutch08 indexer error

Posted by Feng Ji <fe...@gmail.com>.
hi *Teruhiko:*
**
*I replace the hadoop04 with 05 version, recompile the nutch. But nutch
gives me the error messages when I run seeds inject at very beginning.*
**
*Any though you will have?*
**
*thanks your time,*

java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configured
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
 at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
Exception in thread "main"


On 8/8/06, Teruhiko Kurosaka <Ku...@basistech.com> wrote:
>
> The problem #2 might be due to
> http://issues.apache.org/jira/browse/NUTCH-266
>
> Download the latest Hadoop and replace the haddop-*.jar in lib with that
> version.
>
> > -----Original Message-----
> > From: Feng Ji [mailto:fengji2004@gmail.com]
> > Sent: 2006-8-07 18:00
> > To: nutch-user@lucene.apache.org
> > Subject: nutch08 indexer error
> >
> > Hi there,
> >
> > I met an issue to run "nutch/bin index...". I checked out
> > latest nutch from
> > SVN, so I am running nutch-08.
> >
> > I searched the achived emails, and there is one email mentioned
> > that "index-basic" must be in index configuration xml, which
> > I checked my
> > config and it is already included.
> >
> > 1.
> > In one case, indexing log showing:
> > "
> > Indexing [http://calendar.ufl.edu/] with analyzer
> > org.apache.nutch.analysis.NutchDocumentAnalyzer@53fb57 (null)
> > "
> >
> > But, it could still finish indexing and searching is
> > successfully after. A
> > bit weird thing.
> >
> > 2.
> > In the other crawling case, I indexing multiple segments and
> > run into fatal
> > error as
> >
> > "
> > Indexer: starting
> > Indexer: linkdb: crawl/linkdb
> > Indexer: adding segment: crawl/segments/20060807202736
> > Indexer: adding segment: crawl/segments/20060807202824
> > Optimizing index.
> > Exception in thread "main" java.io.IOException: Job failed!
> >  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
> >  at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
> >  at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
> > "
> >
> > I wonder what cause the error?
> >
> > 3.
> > Is the downloadable nutch-08 release package more stable than
> > version of SVN
> > check out?
> >
> > thanks you time,
> >
> > Feng Ji
> >
>

RE: nutch08 indexer error

Posted by Teruhiko Kurosaka <Ku...@basistech.com>.
The problem #2 might be due to
http://issues.apache.org/jira/browse/NUTCH-266

Download the latest Hadoop and replace the haddop-*.jar in lib with that
version. 

> -----Original Message-----
> From: Feng Ji [mailto:fengji2004@gmail.com] 
> Sent: 2006-8-07 18:00
> To: nutch-user@lucene.apache.org
> Subject: nutch08 indexer error
> 
> Hi there,
> 
> I met an issue to run "nutch/bin index...". I checked out 
> latest nutch from
> SVN, so I am running nutch-08.
> 
> I searched the achived emails, and there is one email mentioned
> that "index-basic" must be in index configuration xml, which 
> I checked my
> config and it is already included.
> 
> 1.
> In one case, indexing log showing:
> "
> Indexing [http://calendar.ufl.edu/] with analyzer
> org.apache.nutch.analysis.NutchDocumentAnalyzer@53fb57 (null)
> "
> 
> But, it could still finish indexing and searching is 
> successfully after. A
> bit weird thing.
> 
> 2.
> In the other crawling case, I indexing multiple segments and 
> run into fatal
> error as
> 
> "
> Indexer: starting
> Indexer: linkdb: crawl/linkdb
> Indexer: adding segment: crawl/segments/20060807202736
> Indexer: adding segment: crawl/segments/20060807202824
> Optimizing index.
> Exception in thread "main" java.io.IOException: Job failed!
>  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>  at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
>  at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
> "
> 
> I wonder what cause the error?
> 
> 3.
> Is the downloadable nutch-08 release package more stable than 
> version of SVN
> check out?
> 
> thanks you time,
> 
> Feng Ji
>