You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Ilia S. Yatsenko" <il...@gmail.com> on 2006/03/15 08:32:18 UTC

javascript in summaries [nutch-0.7.1]

Hello

 

Sorry my little English

 

I use nutch-0.7.1 and have issue with html parser

 

I got in summary javascript code and don't know how to remove it. For
example 

 

. \n'); } if (plugin) { document.write(' '); document.write(' ');
document.write(' '); document.write(' '); document.write(' ');
document.write ...

 

Or http://62.141.52.208:8080/dual/search.jsp?query=document.write :)

 

This is my nutch-site.plugin line:

<property>

<value>nutch-extensionpoints|protocol-(http|httpclient)|urlfilter-regex|pars
e-html|index-(basic|more)|query-(more|stemmer|site|url)</value>

</property>

 

Can anybody help me?