You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2022/01/14 09:43:00 UTC

[jira] [Resolved] (NUTCH-2929) Fetcher: start threads slowly to avoid that resources are temporarily exhausted

     [ https://issues.apache.org/jira/browse/NUTCH-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-2929.
------------------------------------
    Resolution: Implemented

Thanks for the reviews, [~markus17] and [~lewismc]!

> Fetcher: start threads slowly to avoid that resources are temporarily exhausted
> -------------------------------------------------------------------------------
>
>                 Key: NUTCH-2929
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2929
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.19
>
>
> Fetcher spins all threads without any delay. This may cause that certain resources are temporarily exhausted if all threads start fetching the first pages simultaneously.
> The issue has been observed by Tika warnings about overuse of the SAXParser pool which appeared only during the first 2-5 minutes of fetching a segment. See https://lists.apache.org/thread/lo6b9wdlxy2lz12wmosldgl9x9ov1cks - adding a short delay between thread launches makes the warnings disappear.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)