You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Björn Wilmsmann <bj...@wilmsmann.de> on 2008/01/29 02:37:29 UTC
common-terms.utf8 not found in class path when using Nutch from WAR file
Hello everybody,
I have run into a rather weird problem that occurs when deploying a
Grails (http://grails.codehaus.org/) app as a WAR file in Tomcat. My
app instantiates a NutchDocumentAnalyzer during startup as a Spring
resource. The Nutch classes and config files are loaded from a JAR
inside the lib directory of the app.
All of this works fine when running the app via 'grails run-app'.
However, when running the app under Tomcat via the WAR file generated
by 'grails war' I get the following stacktrace (excerpt):
Caused by: org.springframework.beans.BeanInstantiationException: Could
not instantiate bean class
[org.apache.nutch.analysis.NutchDocumentAnalyzer]: Constructor threw
exception; nested exception is java.lang.NullPointerException
at
org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:98)
at
org
.springframework
.beans
.factory
.support
.SimpleInstantiationStrategy
.instantiate(SimpleInstantiationStrategy.java:87)
at
org
.springframework
.beans
.factory
.support
.ConstructorResolver.autowireConstructor(ConstructorResolver.java:233)
... 63 more
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:61)
at java.io.BufferedReader.<init>(BufferedReader.java:76)
at java.io.BufferedReader.<init>(BufferedReader.java:91)
at org.apache.nutch.analysis.CommonGrams.init(CommonGrams.java:152)
at org.apache.nutch.analysis.CommonGrams.<init>(CommonGrams.java:52)
at org.apache.nutch.analysis.NutchDocumentAnalyzer
$ContentAnalyzer.<init>(NutchDocumentAnalyzer.java:64)
at
org
.apache
.nutch
.analysis.NutchDocumentAnalyzer.<init>(NutchDocumentAnalyzer.java:55)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun
.reflect
.NativeConstructorAccessorImpl
.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun
.reflect
.DelegatingConstructorAccessorImpl
.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at
org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:83)
... 65 more
This is caused by the common-terms.utf8 file not being found in line
152 of org.apache.nutch.analysis.CommonGrams. However, this file is
located on the root level of the nutch.jar in the lib directory that
also contains the classes themselves. I have also tried copying the
file to TOMCAT/webapps/MY_APP/WEB-INF/classes, TOMCAT/webapps/MY_APP/
WEB-INF/ and TOMCAT/webapps/MY_APP/WEB-INF/lib, all to no avail.
Does anybody know what this could possibly be caused by?
--
Best regards,
Bjoern Wilmsmann
Re: Can IndexReader be opened on a hadoop directory?
Posted by Andrzej Bialecki <ab...@getopt.org>.
Kenji wrote:
> I'm trying to open a Lucene index created on a hadoop dfs.
> Configuration nutchConf = NutchConfiguration.create();
> FileSystem fs = FileSystem.get(nutchConf);
> Path lastIndex = this.dataConf.lastIndexDir();
> IndexReader idxReader = IndexReader.open(fs.getUri().toString() +
> lastIndex);
>
>
> This results in an exception:
>
> hdfs://localhost:9000/user/kenji/pages/lastIndex
> Exception in thread "main" java.io.IOException: The filename,
> directory name, or volume label syntax is incorrect
> at java.io.WinNTFileSystem.canonicalize0(Native Method)
> at java.io.Win32FileSystem.canonicalize(Unknown Source)
> at java.io.File.getCanonicalPath(Unknown Source)
> at
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:168)
> at
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:139)
> at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
> at ix.indexer.PageIndexer.test(Unknown Source)
> at ix.indexer.PageIndexer.main(Unknown Source)
>
> I've tried without the uri, then it assumes a local file system. Is the
> index reader supposed to work only locally?
If you pass the index path as a String, then IndexReader will silently
attempt to create an FSDirectory to read from it. This works only on
local filesystems. In order to use HDFS, you need to create an instance
of org.apache.nutch.indexer.FsDirectory(Path), and create an IndexReader
using this directory.
Please note that usually the performance of using Lucene indexes
directly from HDFS is poor.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Can IndexReader be opened on a hadoop directory?
Posted by Kenji <ke...@trailfire.com>.
I'm trying to open a Lucene index created on a hadoop dfs.
Configuration nutchConf = NutchConfiguration.create();
FileSystem fs = FileSystem.get(nutchConf);
Path lastIndex = this.dataConf.lastIndexDir();
IndexReader idxReader = IndexReader.open(fs.getUri().toString() +
lastIndex);
This results in an exception:
hdfs://localhost:9000/user/kenji/pages/lastIndex
Exception in thread "main" java.io.IOException: The filename,
directory name, or volume label syntax is incorrect
at java.io.WinNTFileSystem.canonicalize0(Native Method)
at java.io.Win32FileSystem.canonicalize(Unknown Source)
at java.io.File.getCanonicalPath(Unknown Source)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:168)
at
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:139)
at
org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
at ix.indexer.PageIndexer.test(Unknown Source)
at ix.indexer.PageIndexer.main(Unknown Source)
I've tried without the uri, then it assumes a local file system. Is the
index reader supposed to work only locally?
Thanks.
-Kenji Kawai