You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Hadjiat Souad (JIRA)" <ji...@apache.org> on 2015/08/03 12:23:04 UTC
[jira] [Created] (NUTCH-2074) Javascript link not parsed by
JSParseFilter
Hadjiat Souad created NUTCH-2074:
------------------------------------
Summary: Javascript link not parsed by JSParseFilter
Key: NUTCH-2074
URL: https://issues.apache.org/jira/browse/NUTCH-2074
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 1.10
Reporter: Hadjiat Souad
Priority: Minor
JSParseFilter can't extract properly this link :
javascript:tb_show('','http://dummy.url/3S/FRA/contenus/ext/endeca/html/dummy-page.html?TB_iframe=true&height=310&width=600','');
I have run a junit test in debug mode and it seems that the regular expression JSParseFilter.STRING_PATTERN matches ',' only, and doesn't extract the url.
As I'm not the best in regular expressions, I can't propose a patch..
The complete html element is :
<a class="last" href="javascript:tb_show('','http://dummy.url/3S/FRA/contenus/ext/endeca/html/dummy-page.html?TB_iframe=true&height=310&width=600','');">Dummy url</a>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)