You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by GitBox <gi...@apache.org> on 2022/01/11 14:18:06 UTC

[GitHub] [nutch] sebastian-nagel opened a new pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted

sebastian-nagel opened a new pull request #722:
URL: https://github.com/apache/nutch/pull/722


   Sleep for a configurable delay (fetcher.threads.start.delay) before starting the next Fetcher thread to avoid that resources (DNS, Tika XML parser pools) are temporarily exhausted when Fetcher threads fetch the first pages simultaneously
   
   The default delay (10 milliseconds) practically does not impact the behavior/performance of Fetcher - 100 threads are started per second. By choosing a longer delay (500 milliseconds) all Tika warnings about the SAXParser pool size disappeared.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel commented on pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted

Posted by GitBox <gi...@apache.org>.
sebastian-nagel commented on pull request #722:
URL: https://github.com/apache/nutch/pull/722#issuecomment-1011249839


   The warnings include the string: `Contention waiting for a SAXParser. Consider increasing the XMLReaderUtils.POOL_SIZE`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] lewismc commented on pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #722:
URL: https://github.com/apache/nutch/pull/722#issuecomment-1011263668


   ack thanks for confirming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] lewismc commented on pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #722:
URL: https://github.com/apache/nutch/pull/722#issuecomment-1010465132


   @sebastian-nagel this LGTM.
   Can you please post an example Tika warning about SAXParser pool?
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel merged pull request #722: NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted

Posted by GitBox <gi...@apache.org>.
sebastian-nagel merged pull request #722:
URL: https://github.com/apache/nutch/pull/722


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@nutch.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org