You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by William Surowiec <ws...@gmail.com> on 2006/07/17 04:40:13 UTC

Possible problem in WebAppModule

I checked out a copy of nutch this morning from svn. After adding two
jars referenced in an archived email, I still had a build problem. It
occurs in org.apache.nutch.webapp.common.WebAppModule. Guessing, I
changed lines 161/162 to:

      String servletName = ((CharacterData)servlet).getData().trim();
      String urlPattern = ((CharacterData)pattern).getData().trim();

Additionally, during nutch start up I am experiencing a problem similar to what others have reported with hadoop:

Exception in thread "main" java.io.IOException: Input directory e:/apps/nutch/crawlTest/urls in local is invalid.
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
	at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
	at org.apache.nutch.crawl.Crawl.main(Crawl.java:105)

I have made the change to Injector suggested in a posting:

//    mergeJob.addInputPath(tempDir);
    mergeJob.setInputPath(tempDir);

It appears not to help.

I have not been able to get past this point. Any suggestions welcomed.

And since this is my first post, I would be ungrateful if I did not express my thanks and appreciation for this awesome body of work.

Bill



Re: Possible problem in WebAppModule

Posted by William Surowiec <ws...@gmail.com>.
Sami Siren wrote:
> William Surowiec wrote:
>
>> I checked out a copy of nutch this morning from svn. After adding two
>> jars referenced in an archived email, I still had a build problem. It
>> occurs in org.apache.nutch.webapp.common.WebAppModule. Guessing, I
>> changed lines 161/162 to:
>>
>>      String servletName = ((CharacterData)servlet).getData().trim();
>>      String urlPattern = ((CharacterData)pattern).getData().trim();
>>
>>  
>>
> What is the compilation error you are seeing and in what environment
> (os, jvm)?
>
> -- 
> Sami Siren
>
Commenting out my revision, switching back to the original code, I
receive the following compilation error within eclipse (sorry for the
lack of alignment.)


Severity and Description    Path    Resource    Location    Creation
Time    Id
The method getTextContent() is undefined for the type Element    nutch
dist/contrib/web2/src/main/java/org/apache/nutch/webapp/common   
WebAppModule.java    line 163    1153141616750    30768
The method getTextContent() is undefined for the type Element    nutch
dist/contrib/web2/src/main/java/org/apache/nutch/webapp/common   
WebAppModule.java    line 164    1153141616750    30769



I have crawled a website with nutch and built an index (neat stuff) but
have not been able to test my change (other than it compiles.)


Bill

Re: Possible problem in WebAppModule

Posted by Sami Siren <ss...@gmail.com>.
William Surowiec wrote:

>I checked out a copy of nutch this morning from svn. After adding two
>jars referenced in an archived email, I still had a build problem. It
>occurs in org.apache.nutch.webapp.common.WebAppModule. Guessing, I
>changed lines 161/162 to:
>
>      String servletName = ((CharacterData)servlet).getData().trim();
>      String urlPattern = ((CharacterData)pattern).getData().trim();
>
>  
>
What is the compilation error you are seeing and in what environment
(os, jvm)?

--
 Sami Siren