You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Dennis Kubes (JIRA)" <ji...@apache.org> on 2007/06/26 05:35:26 UTC
[jira] Resolved: (NUTCH-497) Extreme Nested Tags causes
StackOverflowException in DomContentUtils...Spider Trap
[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dennis Kubes resolved NUTCH-497.
--------------------------------
Resolution: Fixed
commited with revision 550669
> Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap
> ----------------------------------------------------------------------------------
>
> Key: NUTCH-497
> URL: https://issues.apache.org/jira/browse/NUTCH-497
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 0.8.1, 0.9.0, 1.0.0
> Environment: all
> Reporter: Dennis Kubes
> Assignee: Dennis Kubes
> Fix For: 1.0.0
>
> Attachments: ExtremeNestedTags.patch, nested-tags-trap.patch, nested-tags-trap2.patch, nested-tags-trap3.patch
>
>
> Some webpages have a form of a spider trap that causes a StackOverflowException in DomContentUtils by having nested tags with thousands of layers deep. DomContentUtils when trying to get outlinks uses a recursive method to parse the html. With this type of nesting it errors out.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.