You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Neal Whitley <ne...@e-travelmedia.com> on 2006/01/03 19:51:49 UTC

Nutch-87 Setup

Nutch-87 Setup

I am looking to create a vertical/regional search application and the 
Nutch-87 plugin sounds perfect for what I want to do.  However, this 
is all VERY new to me (java, ant, tomcat, nutch etc. but I was able 
to hack my way through the installation and have a working copy of 
Nutch working.

I am having problems trying to install and build the plugin.  I have 
read the docs but it's totally clear on the steps to add a new plugin 
into nutch.

Can anyone give me any pointers as what's happening here.  Please 
bear in mind I am a nutch newbie.


Here are the steps I have taken:
1.) I downloaded the oc-0[1].3.2.zip file.
2.) FTP'd the zip to the server
3.)  unziped in:  "/caribbeanlinks.com/nutch/nutch/src/plugin/"
4.) Created the "/epile/src/java" folder and 
placed  "/crawl/plugin/whitelisturlfilter" directory and added 
WhitelistURLFilter.java
/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter
5.) Created the build.xml and plugin.xml files in 
"/caribbeanlinks.com/nutch/nutch/src/plugin/epile"  (see examples below)
6.) ran "ant"

[caribmag]$ ant
Buildfile: build.xml

compile:
     [javac] Compiling 51 source files to 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/classes
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/CrawlSeedSource.java:21: 
<identifier> expected
     [javac]   Iterator<SeedURL> getSeedURLs() throws IOException;
     [javac]           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/CrawlSeedSource.java:21: 
= expected
     [javac]   Iterator<SeedURL> getSeedURLs() throws IOException;
     [javac]                                                     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/DefaultFetchList.java:38: 
<identifier> expected
     [javac]   private HashMap<String, HostQueue> hosts = new 
HashMap<String, HostQueue>();
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/DefaultFetchList.java:49: 
<identifier> expected
     [javac]   private TreeSet<HostQueue> blockedHosts = new 
TreeSet(new Comparator<HostQueue>() {
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/FetcherThread.java:51: 
<identifier> expected
     [javac]   protected LinkedHashMap<URL, ScheduledURL> linkQueue = 
new LinkedHashMap();
     [javac]                          ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/FileCrawlSeedSource.java:15: 
<identifier> expected
     [javac]   protected ArrayList<SeedURL> seeds = new ArrayList();
     [javac]                      ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/FileCrawlSeedSource.java:31: 
<identifier> expected
     [javac]   public Iterator<SeedURL> getSeedURLs() throws IOException {
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/FileCrawlSeedSource.java:38: 
';' expected
     [javac] }
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/HostQueue.java:27: 
<identifier> expected
     [javac]   private LinkedList<ScheduledURL> pages = new LinkedList();
     [javac]                     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/HostQueue.java:32: 
<identifier> expected
     [javac]   private TreeSet<Long> checksums = new TreeSet();
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/HostQueue.java:105: 
<identifier> expected
     [javac]   public LinkedList<ScheduledURL> getPages() {
     [javac]                    ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/HostQueue.java:146: 
';' expected
     [javac] }
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/InMemoryFetchedURLs.java:14: 
<identifier> expected
     [javac]   private Set<String> fetched = new HashSet();
     [javac]              ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/NutchFetchListCrawlSeedSource.java:74: 
<identifier> expected
     [javac]   public Iterator<SeedURL> getSeedURLs() throws IOException {
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/NutchFetchListCrawlSeedSource.java:91: 
';' expected
     [javac] }
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:13: 
<identifier> expected
     [javac]   private List<PostFetchProcessor> processors;
     [javac]               ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:15: 
';' expected
     [javac]     for(PostFetchProcessor pp : processors) {
     [javac]                               ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:18: 
illegal start of expression
     [javac]   }
     [javac]   ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:21: 
';' expected
     [javac]     for(PostFetchProcessor pp : processors) {
     [javac]                               ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:24: 
illegal start of expression
     [javac]   }
     [javac]   ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:26: 
<identifier> expected
     [javac]   public void setProcessors(List<PostFetchProcessor> processors) {
     [javac]                                 ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/PostFetchProcessorChain.java:29: 
')' expected
     [javac] }
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/http/HttpResponse.java:39: 
<identifier> expected
     [javac]   static Map<String, Byte> serverHttpVersion = new Hashtable();
     [javac]             ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/http/HttpResponse.java:44: 
<identifier> expected
     [javac]   protected Map<String, Integer> codes = new HashMap();
     [javac]                ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/AbstractScope.java:11: 
'{' expected
     [javac] public abstract class AbstractScope<T> {
     [javac]                                    ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/AbstractScope.java:54: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/FetchListScope.java:11: 
'{' expected
     [javac] public class FetchListScope extends 
AbstractScope<FetchListScope.Input> {
     [javac]                                                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/MapFileContentSeenFilter.java:17: 
'{' expected
     [javac]     implements ScopeFilter<PostFetchScope.Input>, 
PostFetchProcessor {
     [javac]                           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/MapFileContentSeenFilter.java:51: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/NutchUrlFLFilter.java:9: 
'{' expected
     [javac] public class NutchUrlFLFilter implements 
ScopeFilter<FetchListScope.Input>{
     [javac]                                                     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/NutchUrlFLFilter.java:20: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/OneExternalLinkFLFilter.java:12: 
'{' expected
     [javac] public class OneExternalLinkFLFilter implements 
ScopeFilter<FetchListScope.Input>{
     [javac]                                                            ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/OneExternalLinkFLFilter.java:32: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/ParseScope.java:8: 
'{' expected
     [javac] public class ParseScope extends AbstractScope<FetcherOutput>{
     [javac]                                              ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/ParseScope.java:10: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/PostFetchScope.java:10: 
'{' expected
     [javac] public class PostFetchScope extends 
AbstractScope<PostFetchScope.Input> {
     [javac]                                                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SameParentHostFLFilter.java:8: 
'{' expected
     [javac] public class SameParentHostFLFilter implements 
ScopeFilter<FetchListScope.Input>{
     [javac]                                                           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SameParentHostFLFilter.java:16: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SameParentPathFLFilter.java:6: 
'{' expected
     [javac] public class SameParentPathFLFilter implements 
ScopeFilter<FetchListScope.Input> {
     [javac]                                                           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SameParentPathFLFilter.java:20: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SameParentTLDFLFilter.java:9: 
'{' expected
     [javac] public class SameParentTLDFLFilter implements 
ScopeFilter<FetchListScope.Input> {
     [javac]                                                          ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SameParentTLDFLFilter.java:30: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/ScopeFilter.java:6: 
'{' expected
     [javac] public interface ScopeFilter<T> {
     [javac]                             ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/ScopeFilter.java:19: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SizeConstrainedFLFilter.java:8: 
'{' expected
     [javac] public class SizeConstrainedFLFilter implements 
ScopeFilter<FetchListScope.Input>{
     [javac]                                                            ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/SizeConstrainedFLFilter.java:24: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/WebDBContentSeenFilter.java:13: 
'{' expected
     [javac] public class WebDBContentSeenFilter implements 
ScopeFilter<FetcherOutput> {
     [javac]                                                           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/scope/WebDBContentSeenFilter.java:38: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/LongLongPersister.java:8: 
'{' expected
     [javac] public class LongLongPersister extends 
MapFilePersister<Long, Long> {
     [javac]                                                        ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/LongLongPersister.java:73: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/LongPersister.java:13: 
'{' expected
     [javac] public class LongPersister extends 
MapFilePersister<Long, NullWritable> {
     [javac]                                                    ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/LongPersister.java:61: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MD5Persister.java:21: 
'{' expected
     [javac] public class MD5Persister extends 
MapFilePersister<MD5Hash, NullWritable> {
     [javac]                                                   ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MD5Persister.java:56: 
'}' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:20: 
'{' expected
     [javac] public abstract class MapFilePersister <K, V> {
     [javac]                                        ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:22: 
<identifier> expected
     [javac]       LogFormatter.getLogger(MapFilePersister.class.getName());
     [javac]                                                    ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:22: 
'{' expected
     [javac]       LogFormatter.getLogger(MapFilePersister.class.getName());
     [javac]                                                               ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:46: 
<identifier> expected
     [javac]   protected TreeMap<K, V> buffer = new 
TreeMap(getTypeComparator());
     [javac]                    ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:102: 
<identifier> expected
     [javac]   protected abstract Comparator<K> getTypeComparator();
     [javac]                                ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:219: 
';' expected
     [javac]       for (K k : buffer.keySet()) {
     [javac]                ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:229: 
illegal start of expression
     [javac]     } finally {
     [javac]     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:228: 
')' expected
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/src/java/org/supermind/crawl/util/MapFilePersister.java:308: 
'}' expected
     [javac] ^
     [javac] 63 errors

BUILD FAILED
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/oc/build.xml:24: 
Compile failed; see the compiler error output for details.

Total time: 4 seconds
[caribmag]$






build.xml
---------------------------
<?xml version="1.0"?>

<project name="WhitelistURLFilter" default="jar">

   <import file="../build-plugin.xml"/>

</project>



plugin.xml
---------------------------
<?xml version="1.0" encoding="UTF-8"?>
<plugin
    id="epile-whitelisturlfilter"
    name="Epile whitelist URL filter"
    version="1.0.0"
    provider-name="teamgigabyte.com">

    <extension-point
       id="org.apache.nutch.net.URLFilter"
       name="Nutch URL Filter"/>

    <runtime></runtime>

    <extension id="org.apache.nutch.net.urlfiler"
       name="Epile Whitelist URL Filter"
       point="org.apache.nutch.net.URLFilter">

       <implementation id="WhitelistURLFilter"
          class="epile.crawl.plugin.WhitelistURLFilter"/>
    </extension>
</plugin>



Re: [bug] Re: NegativeArraySizeException in search server

Posted by Doug Cutting <cu...@nutch.org>.
I just committed a fix for this.  Thanks for diagnosing it!

I changed things so that hit limiting is entirely disabled when 
searcher.max.hits is not positive.

Doug

Marko Bauhardt wrote:
> Hi,
> I got the same Exception. The cause of this exception is the default  
> value of searcher.max.hits property in the nutch-default.xml. The  
> default value is Integer.MAX_VALUE. But the class   
> org.apache.lucene.util.PriorityQueue increment this max.value.
> The next number after Integer.MAX_VALUE is -2147483648. You must  
> decrease the searcher.max.hits to fix this.
> But notice: The PriorityQueue use an Array of this size. If large  a  
> value is defined an OutOfMemoryException occurs.
> Any Ideas suggestion how to fix this?
> 
> Marko
> 
> 
> 
> 
> Am 04.01.2006 um 02:00 schrieb Gal Nitzan:
> 
>> When trying to use the search server I get.
>>
>> I use the trunk from today...
>>
>> 060104 025549 13 Server handler 0 on 9004 call error:
>> java.io.IOException: java.lang.NegativeArraySizeException
>> java.io.IOException: java.lang.NegativeArraySizeException
>>         at
>> org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:35)
>>         at org.apache.lucene.search.HitQueue.<init>(HitQueue.java:23)
>>         at
>> org.apache.lucene.search.TopDocCollector.<init> (TopDocCollector.java:47)
>>         at org.apache.nutch.searcher.LuceneQueryOptimizer
>> $LimitedCollector.<init>(LuceneQueryOptimizer.java:52)
>>         at
>> org.apache.nutch.searcher.LuceneQueryOptimizer.optimize 
>> (LuceneQueryOptimizer.java:153)
>>         at
>> org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
>>         at
>> org.apache.nutch.searcher.NutchBean.search(NutchBean.java:155)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke 
>> (NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke 
>> (DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:324)
>>         at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
>>         at org.apache.nutch.ipc.Server$Handler.run(Server.java:200)
>>
>>
>>
> 
> 

Re: [bug] Re: NegativeArraySizeException in search server

Posted by Gal Nitzan <gn...@usa.net>.
Yes correct. for a second I thought it was fixed :)



On Wed, 2006-01-04 at 10:57 +0100, Marko Bauhardt wrote:
> Hi,
> I got the same Exception. The cause of this exception is the default  
> value of searcher.max.hits property in the nutch-default.xml. The  
> default value is Integer.MAX_VALUE. But the class   
> org.apache.lucene.util.PriorityQueue increment this max.value.
> The next number after Integer.MAX_VALUE is -2147483648. You must  
> decrease the searcher.max.hits to fix this.
> But notice: The PriorityQueue use an Array of this size. If large  a  
> value is defined an OutOfMemoryException occurs.
> Any Ideas suggestion how to fix this?
> 
> Marko
> 
> 
> 
> 
> Am 04.01.2006 um 02:00 schrieb Gal Nitzan:
> 
> > When trying to use the search server I get.
> >
> > I use the trunk from today...
> >
> > 060104 025549 13 Server handler 0 on 9004 call error:
> > java.io.IOException: java.lang.NegativeArraySizeException
> > java.io.IOException: java.lang.NegativeArraySizeException
> >         at
> > org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:35)
> >         at org.apache.lucene.search.HitQueue.<init>(HitQueue.java:23)
> >         at
> > org.apache.lucene.search.TopDocCollector.<init> 
> > (TopDocCollector.java:47)
> >         at org.apache.nutch.searcher.LuceneQueryOptimizer
> > $LimitedCollector.<init>(LuceneQueryOptimizer.java:52)
> >         at
> > org.apache.nutch.searcher.LuceneQueryOptimizer.optimize 
> > (LuceneQueryOptimizer.java:153)
> >         at
> > org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
> >         at
> > org.apache.nutch.searcher.NutchBean.search(NutchBean.java:155)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> > sun.reflect.NativeMethodAccessorImpl.invoke 
> > (NativeMethodAccessorImpl.java:39)
> >         at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke 
> > (DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:324)
> >         at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
> >         at org.apache.nutch.ipc.Server$Handler.run(Server.java:200)
> >
> >
> >
> 



[bug] Re: NegativeArraySizeException in search server

Posted by Marko Bauhardt <mb...@media-style.com>.
Hi,
I got the same Exception. The cause of this exception is the default  
value of searcher.max.hits property in the nutch-default.xml. The  
default value is Integer.MAX_VALUE. But the class   
org.apache.lucene.util.PriorityQueue increment this max.value.
The next number after Integer.MAX_VALUE is -2147483648. You must  
decrease the searcher.max.hits to fix this.
But notice: The PriorityQueue use an Array of this size. If large  a  
value is defined an OutOfMemoryException occurs.
Any Ideas suggestion how to fix this?

Marko




Am 04.01.2006 um 02:00 schrieb Gal Nitzan:

> When trying to use the search server I get.
>
> I use the trunk from today...
>
> 060104 025549 13 Server handler 0 on 9004 call error:
> java.io.IOException: java.lang.NegativeArraySizeException
> java.io.IOException: java.lang.NegativeArraySizeException
>         at
> org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:35)
>         at org.apache.lucene.search.HitQueue.<init>(HitQueue.java:23)
>         at
> org.apache.lucene.search.TopDocCollector.<init> 
> (TopDocCollector.java:47)
>         at org.apache.nutch.searcher.LuceneQueryOptimizer
> $LimitedCollector.<init>(LuceneQueryOptimizer.java:52)
>         at
> org.apache.nutch.searcher.LuceneQueryOptimizer.optimize 
> (LuceneQueryOptimizer.java:153)
>         at
> org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
>         at
> org.apache.nutch.searcher.NutchBean.search(NutchBean.java:155)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:324)
>         at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
>         at org.apache.nutch.ipc.Server$Handler.run(Server.java:200)
>
>
>


NegativeArraySizeException in search server

Posted by Gal Nitzan <gn...@usa.net>.
When trying to use the search server I get.

I use the trunk from today...

060104 025549 13 Server handler 0 on 9004 call error:
java.io.IOException: java.lang.NegativeArraySizeException
java.io.IOException: java.lang.NegativeArraySizeException
        at
org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java:35)
        at org.apache.lucene.search.HitQueue.<init>(HitQueue.java:23)
        at
org.apache.lucene.search.TopDocCollector.<init>(TopDocCollector.java:47)
        at org.apache.nutch.searcher.LuceneQueryOptimizer
$LimitedCollector.<init>(LuceneQueryOptimizer.java:52)
        at
org.apache.nutch.searcher.LuceneQueryOptimizer.optimize(LuceneQueryOptimizer.java:153)
        at
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:93)
        at
org.apache.nutch.searcher.NutchBean.search(NutchBean.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:324)
        at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
        at org.apache.nutch.ipc.Server$Handler.run(Server.java:200)



Re: Nutch-87 Setup

Posted by Matt Kangas <ka...@gmail.com>.
Hi Neal,

The code attached to the ticket does indeed work for me, but I'm  
afraid it's a little rough around the edges. Plus, I think I forgot  
to include the WhitelistWriter class. :)

What timeframe do you need this within? I usually see one request a  
month for this, so I should clean clean it up at some point.

--Matt

On Jan 3, 2006, at 2:38 PM, Neal Whitley wrote:

> Sorry, I posted the incorrect error code in my previous messages.   
> Here is the output I get when running ant with the Nutch-87 plugin:
>
>
> [caribmag]$ ant -v
> Apache Ant version 1.6.5 compiled on June 2 2005
> Buildfile: build.xml
> Detected Java version: 1.4 in: /home/1/caribmag/j2sdk1.4.2_10/jre
> Detected OS: Linux
> parsing buildfile /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/epile/build.xml with URI = file:///home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
> Project base dir set to: /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/src/plugin/epile
> Importing file ../build-plugin.xml from /home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
> parsing buildfile /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/build-plugin.xml with URI = file:///home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml
>  [property] Loading /home/caribmag/WhitelistURLFilter.build.properties
>  [property] Unable to find property file: /home/caribmag/ 
> WhitelistURLFilter.build.properties
>  [property] Loading /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/epile/build.properties
>  [property] Unable to find property file: /home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.properties
> [available] Unable to find dir src/test to set property test.available
> Build sequence for target(s) `jar' is [init, compile, jar]
> Complete build sequence is [init, compile, jar, init-plugin,  
> deploy, compile-test, clean, test, ]
>
> init:
>     [mkdir] Created dir: /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/build/WhitelistURLFilter
>     [mkdir] Created dir: /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/build/WhitelistURLFilter/classes
>     [mkdir] Created dir: /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/build/WhitelistURLFilter/test
> Project base dir set to: /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/src/plugin/epile
>   [antcall] calling target(s) [init-plugin] in build file /home/1/ 
> caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
> parsing buildfile /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/epile/build.xml with URI = file:///home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
> Project base dir set to: /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/src/plugin/epile
> Importing file ../build-plugin.xml from /home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
> parsing buildfile /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/build-plugin.xml with URI = file:///home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml
> Override ignored for property name
> Override ignored for property root
>  [property] Loading /home/caribmag/WhitelistURLFilter.build.properties
>  [property] Unable to find property file: /home/caribmag/ 
> WhitelistURLFilter.build.properties
>  [property] Loading /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/epile/build.properties
>  [property] Unable to find property file: /home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.properties
> Override ignored for property nutch.root
> Override ignored for property src.dir
> Override ignored for property src.test
> [available] Unable to find dir src/test to set property test.available
> Override ignored for property conf.dir
> Override ignored for property build.dir
> Override ignored for property build.classes
> Override ignored for property build.test
> Override ignored for property deploy.dir
> Override ignored for property javac.deprecation
> Override ignored for property javac.debug
> Override ignored for property javadoc.link
> Override ignored for property build.encoding
> Build sequence for target(s) `init-plugin' is [init-plugin]
> Complete build sequence is [init-plugin, init, compile, jar,  
> deploy, compile-test, clean, test, ]
>   [antcall] Entering /home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/src/plugin/epile/build.xml...
> Build sequence for target(s) `init-plugin' is [init-plugin]
> Complete build sequence is [init-plugin, init, compile, jar,  
> deploy, compile-test, clean, test, ]
>
> init-plugin:
>   [antcall] Exiting /home/1/caribmag/caribbeanlinks.com/nutch/nutch/ 
> src/plugin/epile/build.xml.
>
> compile:
>      [echo] Compiling plugin: WhitelistURLFilter
>     [javac] crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java  
> added as crawl/plugin/whitelisturlfilter/WhitelistURLFilter.class  
> doesn't exist.
>     [javac] Compiling 1 source file to /home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter/classes
>     [javac] Using modern compiler
> dropping /home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/ 
> classes from path as it doesn't exist
> dropping /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/ 
> epile/home/caribmag/tomcatcommonlibservlet.jar from path as it  
> doesn't exist
>     [javac] Compilation arguments:
>     [javac] '-d'
>     [javac] '/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/ 
> WhitelistURLFilter/classes'
>     [javac] '-classpath'
>     [javac] '/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/ 
> WhitelistURLFilter/classes:/home/1/caribmag/caribbeanlinks.com/ 
> nutch/nutch/lib/commons-logging-api-1.0.4.jar:/home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/lib/concurrent-1.3.4.jar:/home/1/ 
> caribmag/caribbeanlinks.com/nutch/nutch/lib/jakarta-oro-2.0.7.jar:/ 
> home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/jetty-5.1.2.jar:/ 
> home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/junit-3.8.1.jar:/ 
> home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/lucene-1.9-rc1- 
> dev.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/lucene- 
> misc-1.9-rc1-dev.jar:/home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/lib/servlet-api.jar:/home/1/caribmag/caribbeanlinks.com/nutch/ 
> nutch/lib/spring-beans.jar:/home/1/caribmag/caribbeanlinks.com/ 
> nutch/nutch/lib/spring-core.jar:/home/1/caribmag/caribbeanlinks.com/ 
> nutch/nutch/lib/taglibs-i18n.jar:/home/1/caribmag/ 
> caribbeanlinks.com/nutch/nutch/lib/xerces-2_6_2-apis.jar:/home/1/ 
> caribmag/caribbeanlinks.com/nutc
> h/nut
> c h/lib/xerces-2_6_2.jar:/home/caribmag/apache-ant/lib/ant- 
> launcher.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile:/home/caribmag/apache-ant/lib/ant-antlr.jar:/home/ 
> caribmag/apache-ant/lib/ant-apache-bcel.jar:/home/caribmag/apache- 
> ant/lib/ant-apache-bsf.jar:/home/caribmag/apache-ant/lib/ant-apache- 
> log4j.jar:/home/caribmag/apache-ant/lib/ant-apache-oro.jar:/home/ 
> caribmag/apache-ant/lib/ant-apache-regexp.jar:/home/caribmag/apache- 
> ant/lib/ant-apache-resolver.jar:/home/caribmag/apache-ant/lib/ant- 
> commons-logging.jar:/home/caribmag/apache-ant/lib/ant-commons- 
> net.jar:/home/caribmag/apache-ant/lib/ant-icontract.jar:/home/ 
> caribmag/apache-ant/lib/ant-jai.jar:/home/caribmag/apache-ant/lib/ 
> ant-javamail.jar:/home/caribmag/apache-ant/lib/ant-jdepend.jar:/ 
> home/caribmag/apache-ant/lib/ant-jmf.jar:/home/caribmag/apache-ant/ 
> lib/ant-jsch.jar:/home/caribmag/apache-ant/lib/ant-junit.jar:/home/ 
> caribmag/apache-ant/lib/ant-netrexx.jar:/home/caribmag/apache-ant/ 
> lib/ant-nodeps.jar
> :/hom
> e /caribmag/apache-ant/lib/ant-starteam.jar:/home/caribmag/apache- 
> ant/lib/ant-stylebook.jar:/home/caribmag/apache-ant/lib/ant- 
> swing.jar:/home/caribmag/apache-ant/lib/ant-trax.jar:/home/caribmag/ 
> apache-ant/lib/ant-vaj.jar:/home/caribmag/apache-ant/lib/ant- 
> weblogic.jar:/home/caribmag/apache-ant/lib/ant-xalan1.jar:/home/ 
> caribmag/apache-ant/lib/ant-xslp.jar:/home/caribmag/apache-ant/lib/ 
> ant.jar:/home/caribmag/apache-ant/lib/xercesImpl.jar:/home/caribmag/ 
> apache-ant/lib/xml-apis.jar:/home/1/caribmag/j2sdk1.4.2_10/lib/ 
> tools.jar'
>     [javac] '-sourcepath'
>     [javac] '/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java'
>     [javac] '-encoding'
>     [javac] 'ISO-8859-1'
>     [javac] '-g'
>     [javac]
>     [javac] The ' characters around the executable and arguments are
>     [javac] not part of the command.
>     [javac] File to be compiled:
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:3: package epile.crawl.util does not exist
>     [javac] import epile.crawl.util.StringURL;
>     [javac]                         ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:4: package epile.util does not exist
>     [javac] import epile.util.LogLevel;
>     [javac]                   ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:5: package org.apache.nutch.util does not  
> exist
>     [javac] import org.apache.nutch.util.NutchConf;
>     [javac]                              ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:6: package org.apache.nutch.plugin does not  
> exist
>     [javac] import org.apache.nutch.plugin.Extension;
>     [javac]                                ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:7: package org.apache.nutch.plugin does not  
> exist
>     [javac] import org.apache.nutch.plugin.PluginRepository;
>     [javac]                                ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:8: package org.apache.nutch.net does not exist
>     [javac] import org.apache.nutch.net.URLFilter;
>     [javac]                             ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:9: package org.apache.nutch.fs does not exist
>     [javac] import org.apache.nutch.fs.*;
>     [javac] ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:10: package org.apache.nutch.io does not exist
>     [javac] import org.apache.nutch.io.*;
>     [javac] ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:27: cannot resolve symbol
>     [javac] symbol  : class URLFilter
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac] public class WhitelistURLFilter implements URLFilter {
>     [javac]                                            ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:59: cannot resolve symbol
>     [javac] symbol  : class NutchFileSystem
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]   static private NutchFileSystem nfs;
>     [javac]                  ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:60: package MapFile does not exist
>     [javac]   static private MapFile.Reader whitelistMap;
>     [javac]                         ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:29: cannot resolve symbol
>     [javac] symbol  : variable LogLevel
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]   private static final Logger LOG = LogLevel.get 
> (WhitelistURLFilter.class.getName());
>     [javac]                                     ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:39: cannot resolve symbol
>     [javac] symbol  : class Extension
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     Extension[] extensions =  
> PluginRepository.getInstance().getExtensionPoint 
> (URLFilter.class.getName()).getExtentens();
>     [javac]     ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:39: cannot resolve symbol
>     [javac] symbol  : class URLFilter
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     Extension[] extensions =  
> PluginRepository.getInstance().getExtensionPoint 
> (URLFilter.class.getName()).getExtentens();
>     [javac] ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:39: cannot resolve symbol
>     [javac] symbol  : variable PluginRepository
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     Extension[] extensions =  
> PluginRepository.getInstance().getExtensionPoint 
> (URLFilter.class.getName()).getExtentens();
>     [javac]                              ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:42: cannot resolve symbol
>     [javac] symbol  : class Extension
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       Extension extension = extensions[i];
>     [javac]       ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:69: cannot resolve symbol
>     [javac] symbol  : class NutchConf
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     NutchConf nutchConf = NutchConf.get();
>     [javac]     ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:69: cannot resolve symbol
>     [javac] symbol  : variable NutchConf
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     NutchConf nutchConf = NutchConf.get();
>     [javac]                           ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:82: cannot resolve symbol
>     [javac] symbol  : class NutchConf
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     NutchConf nutchConf = NutchConf.get();
>     [javac]     ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:82: cannot resolve symbol
>     [javac] symbol  : variable NutchConf
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     NutchConf nutchConf = NutchConf.get();
>     [javac]                           ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:90: cannot resolve symbol
>     [javac] symbol  : class LocalFileSystem
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       this.nfs = new LocalFileSystem();
>     [javac]                      ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:92: package MapFile does not exist
>     [javac]         whitelistMap = new MapFile.Reader(this.nfs,  
> mapFileDir);
>     [javac]                                   ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:184: cannot resolve symbol
>     [javac] symbol  : variable StringURL
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     String hostname = StringURL.extractHostname(url);
>     [javac]                       ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:187: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     UTF8 value = new UTF8();
>     [javac]     ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:187: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]     UTF8 value = new UTF8();
>     [javac]                      ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:190: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       UTF8 entry = (UTF8) whitelistMap.get(new UTF8 
> (hostname), value);
>     [javac]       ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:190: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       UTF8 entry = (UTF8) whitelistMap.get(new UTF8 
> (hostname), value);
>     [javac]                     ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:190: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       UTF8 entry = (UTF8) whitelistMap.get(new UTF8 
> (hostname), value);
>     [javac]                                                ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:191: cannot resolve symbol
>     [javac] symbol  : variable StringURL
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       String strippedURL = StringURL.removeHostname(url);
>     [javac]                            ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:198: cannot resolve symbol
>     [javac] symbol  : variable StringURL
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]         String domain =  
> StringURL.extractDomainFromHostname(hostname);
>     [javac]                         ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:201: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]         entry = (UTF8) whitelistMap.get(new UTF8 
> (domain), value);
>     [javac]                  ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:201: cannot resolve symbol
>     [javac] symbol  : class UTF8
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]         entry = (UTF8) whitelistMap.get(new UTF8 
> (domain), value);
>     [javac]                                             ^
>     [javac] /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/ 
> plugin/epile/src/java/crawl/plugin/whitelisturlfilter/ 
> WhitelistURLFilter.java:215: cannot resolve symbol
>     [javac] symbol  : variable StringURL
>     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
>     [javac]       if (StringURL.isCGI(url))
>     [javac]           ^
>     [javac] 33 errors
>
> BUILD FAILED
> /home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/build- 
> plugin.xml:85: Compile failed; see the compiler error output for  
> details.
>         at org.apache.tools.ant.taskdefs.Javac.compile(Javac.java:933)
>         at org.apache.tools.ant.taskdefs.Javac.execute(Javac.java:757)
>         at org.apache.tools.ant.UnknownElement.execute 
> (UnknownElement.java:275)
>         at org.apache.tools.ant.Task.perform(Task.java:364)
>         at org.apache.tools.ant.Target.execute(Target.java:341)
>         at org.apache.tools.ant.Target.performTasks(Target.java:369)
>         at org.apache.tools.ant.Project.executeSortedTargets 
> (Project.java:1216)
>         at org.apache.tools.ant.Project.executeTarget(Project.java: 
> 1185)
>         at  
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets 
> (DefaultExecutor.java:40)
>         at org.apache.tools.ant.Project.executeTargets(Project.java: 
> 1068)
>         at org.apache.tools.ant.Main.runBuild(Main.java:668)
>         at org.apache.tools.ant.Main.startAnt(Main.java:187)
>         at org.apache.tools.ant.launch.Launcher.run(Launcher.java:246)
>         at org.apache.tools.ant.launch.Launcher.main(Launcher.java:67)
>
> Total time: 13 seconds
> [caribmag]$
>
>
>
>
> At 01:51 PM 1/3/2006, you wrote:
>> Nutch-87 Setup
>>
>> I am looking to create a vertical/regional search application and  
>> the Nutch-87 plugin sounds perfect for what I want to do.   
>> However, this is all VERY new to me (java, ant, tomcat, nutch etc.  
>> but I was able to hack my way through the installation and have a  
>> working copy of Nutch working.
>>
>> I am having problems trying to install and build the plugin.  I  
>> have read the docs but it's totally clear on the steps to add a  
>> new plugin into nutch.
>>
>> Can anyone give me any pointers as what's happening here.  Please  
>> bear in mind I am a nutch newbie.
>>
>>
>> Here are the steps I have taken:
>> 1.) I downloaded the oc-0[1].3.2.zip file.
>> 2.) FTP'd the zip to the server
>> 3.)  unziped in:  "/caribbeanlinks.com/nutch/nutch/src/plugin/"
>> 4.) Created the "/epile/src/java" folder and placed  "/crawl/ 
>> plugin/whitelisturlfilter" directory and added  
>> WhitelistURLFilter.java
>> /caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/ 
>> plugin/whitelisturlfilter
>> 5.) Created the build.xml and plugin.xml files in "/ 
>> caribbeanlinks.com/nutch/nutch/src/plugin/epile"  (see examples  
>> below)
>> 6.) ran "ant"
>>
>>
>> <snip>
>>
>>
>>
>> build.xml
>> ---------------------------
>> <?xml version="1.0"?>
>>
>> <project name="WhitelistURLFilter" default="jar">
>>
>>   <import file="../build-plugin.xml"/>
>>
>> </project>
>>
>>
>>
>> plugin.xml
>> ---------------------------
>> <?xml version="1.0" encoding="UTF-8"?>
>> <plugin
>>    id="epile-whitelisturlfilter"
>>    name="Epile whitelist URL filter"
>>    version="1.0.0"
>>    provider-name="teamgigabyte.com">
>>
>>    <extension-point
>>       id="org.apache.nutch.net.URLFilter"
>>       name="Nutch URL Filter"/>
>>
>>    <runtime></runtime>
>>
>>    <extension id="org.apache.nutch.net.urlfiler"
>>       name="Epile Whitelist URL Filter"
>>       point="org.apache.nutch.net.URLFilter">
>>
>>       <implementation id="WhitelistURLFilter"
>>          class="epile.crawl.plugin.WhitelistURLFilter"/>
>>    </extension>
>> </plugin>
>>
>

--
Matt Kangas / kangas@gmail.com



Re: Nutch-87 Setup

Posted by Neal Whitley <ne...@e-travelmedia.com>.
Sorry, I posted the incorrect error code in my previous 
messages.  Here is the output I get when running ant with the Nutch-87 plugin:


[caribmag]$ ant -v
Apache Ant version 1.6.5 compiled on June 2 2005
Buildfile: build.xml
Detected Java version: 1.4 in: /home/1/caribmag/j2sdk1.4.2_10/jre
Detected OS: Linux
parsing buildfile 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml 
with URI = 
file:///home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
Project base dir set to: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile
Importing file ../build-plugin.xml from 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
parsing buildfile 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml 
with URI = 
file:///home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml
  [property] Loading /home/caribmag/WhitelistURLFilter.build.properties
  [property] Unable to find property file: 
/home/caribmag/WhitelistURLFilter.build.properties
  [property] Loading 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.properties
  [property] Unable to find property file: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.properties
[available] Unable to find dir src/test to set property test.available
Build sequence for target(s) `jar' is [init, compile, jar]
Complete build sequence is [init, compile, jar, init-plugin, deploy, 
compile-test, clean, test, ]

init:
     [mkdir] Created dir: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter
     [mkdir] Created dir: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter/classes
     [mkdir] Created dir: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter/test
Project base dir set to: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile
   [antcall] calling target(s) [init-plugin] in build file 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
parsing buildfile 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml 
with URI = 
file:///home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
Project base dir set to: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile
Importing file ../build-plugin.xml from 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml
parsing buildfile 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml 
with URI = 
file:///home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml
Override ignored for property name
Override ignored for property root
  [property] Loading /home/caribmag/WhitelistURLFilter.build.properties
  [property] Unable to find property file: 
/home/caribmag/WhitelistURLFilter.build.properties
  [property] Loading 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.properties
  [property] Unable to find property file: 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.properties
Override ignored for property nutch.root
Override ignored for property src.dir
Override ignored for property src.test
[available] Unable to find dir src/test to set property test.available
Override ignored for property conf.dir
Override ignored for property build.dir
Override ignored for property build.classes
Override ignored for property build.test
Override ignored for property deploy.dir
Override ignored for property javac.deprecation
Override ignored for property javac.debug
Override ignored for property javadoc.link
Override ignored for property build.encoding
Build sequence for target(s) `init-plugin' is [init-plugin]
Complete build sequence is [init-plugin, init, compile, jar, deploy, 
compile-test, clean, test, ]
   [antcall] Entering 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml...
Build sequence for target(s) `init-plugin' is [init-plugin]
Complete build sequence is [init-plugin, init, compile, jar, deploy, 
compile-test, clean, test, ]

init-plugin:
   [antcall] Exiting 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/build.xml.

compile:
      [echo] Compiling plugin: WhitelistURLFilter
     [javac] crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java 
added as crawl/plugin/whitelisturlfilter/WhitelistURLFilter.class 
doesn't exist.
     [javac] Compiling 1 source file to 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter/classes
     [javac] Using modern compiler
dropping 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/classes from 
path as it doesn't exist
dropping 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/home/caribmag/tomcatcommonlibservlet.jar 
from path as it doesn't exist
     [javac] Compilation arguments:
     [javac] '-d'
     [javac] 
'/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter/classes'
     [javac] '-classpath'
     [javac] 
'/home/1/caribmag/caribbeanlinks.com/nutch/nutch/build/WhitelistURLFilter/classes:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/commons-logging-api-1.0.4.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/concurrent-1.3.4.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/jakarta-oro-2.0.7.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/jetty-5.1.2.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/junit-3.8.1.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/lucene-1.9-rc1-dev.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/lucene-misc-1.9-rc1-dev.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/servlet-api.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/spring-beans.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/spring-core.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/taglibs-i18n.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/lib/xerces-2_6_2-apis.jar:/home/1/caribmag/caribbeanlinks.com/nutc
 h/nut
c 
h/lib/xerces-2_6_2.jar:/home/caribmag/apache-ant/lib/ant-launcher.jar:/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile:/home/caribmag/apache-ant/lib/ant-antlr.jar:/home/caribmag/apache-ant/lib/ant-apache-bcel.jar:/home/caribmag/apache-ant/lib/ant-apache-bsf.jar:/home/caribmag/apache-ant/lib/ant-apache-log4j.jar:/home/caribmag/apache-ant/lib/ant-apache-oro.jar:/home/caribmag/apache-ant/lib/ant-apache-regexp.jar:/home/caribmag/apache-ant/lib/ant-apache-resolver.jar:/home/caribmag/apache-ant/lib/ant-commons-logging.jar:/home/caribmag/apache-ant/lib/ant-commons-net.jar:/home/caribmag/apache-ant/lib/ant-icontract.jar:/home/caribmag/apache-ant/lib/ant-jai.jar:/home/caribmag/apache-ant/lib/ant-javamail.jar:/home/caribmag/apache-ant/lib/ant-jdepend.jar:/home/caribmag/apache-ant/lib/ant-jmf.jar:/home/caribmag/apache-ant/lib/ant-jsch.jar:/home/caribmag/apache-ant/lib/ant-junit.jar:/home/caribmag/apache-ant/lib/ant-netrexx.jar:/home/caribmag/apache-ant/lib/ant-nodeps.jar
 :/hom
e 
/caribmag/apache-ant/lib/ant-starteam.jar:/home/caribmag/apache-ant/lib/ant-stylebook.jar:/home/caribmag/apache-ant/lib/ant-swing.jar:/home/caribmag/apache-ant/lib/ant-trax.jar:/home/caribmag/apache-ant/lib/ant-vaj.jar:/home/caribmag/apache-ant/lib/ant-weblogic.jar:/home/caribmag/apache-ant/lib/ant-xalan1.jar:/home/caribmag/apache-ant/lib/ant-xslp.jar:/home/caribmag/apache-ant/lib/ant.jar:/home/caribmag/apache-ant/lib/xercesImpl.jar:/home/caribmag/apache-ant/lib/xml-apis.jar:/home/1/caribmag/j2sdk1.4.2_10/lib/tools.jar'
     [javac] '-sourcepath'
     [javac] 
'/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java'
     [javac] '-encoding'
     [javac] 'ISO-8859-1'
     [javac] '-g'
     [javac]
     [javac] The ' characters around the executable and arguments are
     [javac] not part of the command.
     [javac] File to be compiled:
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:3: 
package epile.crawl.util does not exist
     [javac] import epile.crawl.util.StringURL;
     [javac]                         ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:4: 
package epile.util does not exist
     [javac] import epile.util.LogLevel;
     [javac]                   ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:5: 
package org.apache.nutch.util does not exist
     [javac] import org.apache.nutch.util.NutchConf;
     [javac]                              ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:6: 
package org.apache.nutch.plugin does not exist
     [javac] import org.apache.nutch.plugin.Extension;
     [javac]                                ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:7: 
package org.apache.nutch.plugin does not exist
     [javac] import org.apache.nutch.plugin.PluginRepository;
     [javac]                                ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:8: 
package org.apache.nutch.net does not exist
     [javac] import org.apache.nutch.net.URLFilter;
     [javac]                             ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:9: 
package org.apache.nutch.fs does not exist
     [javac] import org.apache.nutch.fs.*;
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:10: 
package org.apache.nutch.io does not exist
     [javac] import org.apache.nutch.io.*;
     [javac] ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:27: 
cannot resolve symbol
     [javac] symbol  : class URLFilter
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac] public class WhitelistURLFilter implements URLFilter {
     [javac]                                            ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:59: 
cannot resolve symbol
     [javac] symbol  : class NutchFileSystem
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]   static private NutchFileSystem nfs;
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:60: 
package MapFile does not exist
     [javac]   static private MapFile.Reader whitelistMap;
     [javac]                         ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:29: 
cannot resolve symbol
     [javac] symbol  : variable LogLevel
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]   private static final Logger LOG = 
LogLevel.get(WhitelistURLFilter.class.getName());
     [javac]                                     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:39: 
cannot resolve symbol
     [javac] symbol  : class Extension
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     Extension[] extensions = 
PluginRepository.getInstance().getExtensionPoint(URLFilter.class.getName()).getExtentens();
     [javac]     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:39: 
cannot resolve symbol
     [javac] symbol  : class URLFilter
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     Extension[] extensions = 
PluginRepository.getInstance().getExtensionPoint(URLFilter.class.getName()).getExtentens();
     [javac] 
^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:39: 
cannot resolve symbol
     [javac] symbol  : variable PluginRepository
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     Extension[] extensions = 
PluginRepository.getInstance().getExtensionPoint(URLFilter.class.getName()).getExtentens();
     [javac]                              ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:42: 
cannot resolve symbol
     [javac] symbol  : class Extension
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       Extension extension = extensions[i];
     [javac]       ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:69: 
cannot resolve symbol
     [javac] symbol  : class NutchConf
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     NutchConf nutchConf = NutchConf.get();
     [javac]     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:69: 
cannot resolve symbol
     [javac] symbol  : variable NutchConf
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     NutchConf nutchConf = NutchConf.get();
     [javac]                           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:82: 
cannot resolve symbol
     [javac] symbol  : class NutchConf
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     NutchConf nutchConf = NutchConf.get();
     [javac]     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:82: 
cannot resolve symbol
     [javac] symbol  : variable NutchConf
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     NutchConf nutchConf = NutchConf.get();
     [javac]                           ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:90: 
cannot resolve symbol
     [javac] symbol  : class LocalFileSystem
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       this.nfs = new LocalFileSystem();
     [javac]                      ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:92: 
package MapFile does not exist
     [javac]         whitelistMap = new MapFile.Reader(this.nfs, mapFileDir);
     [javac]                                   ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:184: 
cannot resolve symbol
     [javac] symbol  : variable StringURL
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     String hostname = StringURL.extractHostname(url);
     [javac]                       ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:187: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     UTF8 value = new UTF8();
     [javac]     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:187: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]     UTF8 value = new UTF8();
     [javac]                      ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:190: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       UTF8 entry = (UTF8) whitelistMap.get(new 
UTF8(hostname), value);
     [javac]       ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:190: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       UTF8 entry = (UTF8) whitelistMap.get(new 
UTF8(hostname), value);
     [javac]                     ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:190: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       UTF8 entry = (UTF8) whitelistMap.get(new 
UTF8(hostname), value);
     [javac]                                                ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:191: 
cannot resolve symbol
     [javac] symbol  : variable StringURL
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       String strippedURL = StringURL.removeHostname(url);
     [javac]                            ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:198: 
cannot resolve symbol
     [javac] symbol  : variable StringURL
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]         String domain = 
StringURL.extractDomainFromHostname(hostname);
     [javac]                         ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:201: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]         entry = (UTF8) whitelistMap.get(new UTF8(domain), value);
     [javac]                  ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:201: 
cannot resolve symbol
     [javac] symbol  : class UTF8
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]         entry = (UTF8) whitelistMap.get(new UTF8(domain), value);
     [javac]                                             ^
     [javac] 
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter/WhitelistURLFilter.java:215: 
cannot resolve symbol
     [javac] symbol  : variable StringURL
     [javac] location: class epile.crawl.plugin.WhitelistURLFilter
     [javac]       if (StringURL.isCGI(url))
     [javac]           ^
     [javac] 33 errors

BUILD FAILED
/home/1/caribmag/caribbeanlinks.com/nutch/nutch/src/plugin/build-plugin.xml:85: 
Compile failed; see the compiler error output for details.
         at org.apache.tools.ant.taskdefs.Javac.compile(Javac.java:933)
         at org.apache.tools.ant.taskdefs.Javac.execute(Javac.java:757)
         at 
org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:275)
         at org.apache.tools.ant.Task.perform(Task.java:364)
         at org.apache.tools.ant.Target.execute(Target.java:341)
         at org.apache.tools.ant.Target.performTasks(Target.java:369)
         at 
org.apache.tools.ant.Project.executeSortedTargets(Project.java:1216)
         at org.apache.tools.ant.Project.executeTarget(Project.java:1185)
         at 
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:40)
         at org.apache.tools.ant.Project.executeTargets(Project.java:1068)
         at org.apache.tools.ant.Main.runBuild(Main.java:668)
         at org.apache.tools.ant.Main.startAnt(Main.java:187)
         at org.apache.tools.ant.launch.Launcher.run(Launcher.java:246)
         at org.apache.tools.ant.launch.Launcher.main(Launcher.java:67)

Total time: 13 seconds
[caribmag]$




At 01:51 PM 1/3/2006, you wrote:
>Nutch-87 Setup
>
>I am looking to create a vertical/regional search application and 
>the Nutch-87 plugin sounds perfect for what I want to do.  However, 
>this is all VERY new to me (java, ant, tomcat, nutch etc. but I was 
>able to hack my way through the installation and have a working copy 
>of Nutch working.
>
>I am having problems trying to install and build the plugin.  I have 
>read the docs but it's totally clear on the steps to add a new 
>plugin into nutch.
>
>Can anyone give me any pointers as what's happening here.  Please 
>bear in mind I am a nutch newbie.
>
>
>Here are the steps I have taken:
>1.) I downloaded the oc-0[1].3.2.zip file.
>2.) FTP'd the zip to the server
>3.)  unziped in:  "/caribbeanlinks.com/nutch/nutch/src/plugin/"
>4.) Created the "/epile/src/java" folder and 
>placed  "/crawl/plugin/whitelisturlfilter" directory and added 
>WhitelistURLFilter.java
>/caribbeanlinks.com/nutch/nutch/src/plugin/epile/src/java/crawl/plugin/whitelisturlfilter
>5.) Created the build.xml and plugin.xml files in 
>"/caribbeanlinks.com/nutch/nutch/src/plugin/epile"  (see examples below)
>6.) ran "ant"
>
>
><snip>
>
>
>
>build.xml
>---------------------------
><?xml version="1.0"?>
>
><project name="WhitelistURLFilter" default="jar">
>
>   <import file="../build-plugin.xml"/>
>
></project>
>
>
>
>plugin.xml
>---------------------------
><?xml version="1.0" encoding="UTF-8"?>
><plugin
>    id="epile-whitelisturlfilter"
>    name="Epile whitelist URL filter"
>    version="1.0.0"
>    provider-name="teamgigabyte.com">
>
>    <extension-point
>       id="org.apache.nutch.net.URLFilter"
>       name="Nutch URL Filter"/>
>
>    <runtime></runtime>
>
>    <extension id="org.apache.nutch.net.urlfiler"
>       name="Epile Whitelist URL Filter"
>       point="org.apache.nutch.net.URLFilter">
>
>       <implementation id="WhitelistURLFilter"
>          class="epile.crawl.plugin.WhitelistURLFilter"/>
>    </extension>
></plugin>
>