You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Zoltán Zvara <zo...@gmail.com> on 2017/11/20 15:16:40 UTC

Serious OOM while using PhantomJS on Nutch 1.13

Dear Community,

We are experiencing troubling PhantomJS 1.9.8 memory leaks, in which neighbor services, for example a DataNode is not able to execute even a "df" command due to OOM errors on the node. Each node has 128 GB of total memory, and a PhantomJS process easily eats 80GB until it is shut down by the kernel. The problem is that the crawl job is co-located with other services running on YARN. These services throw OOM as well, resulting in cluster-wide failures.

We tried to set up the FF driver with Selenium 2.48.X, which is the current Selenium embedded in Nutch 1.13. The latest FF seems not to be compatible with Selenium 2.48.

1. Would Selenium embedded within Nutch 1.13 work with PhantomJS 2.1.X?
2. What FF driver version would work with embedded Selenium? And how to get it? :-)
3. Have anyone tried "chrome" driver? Any tutorials on how to set it up?

Thanks,
Zoltán