You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Greg Padiasek (JIRA)" <ji...@apache.org> on 2014/05/21 04:56:41 UTC

[jira] [Updated] (NUTCH-1746) OutOfMemoryError in Mappers

     [ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Greg Padiasek updated NUTCH-1746:
---------------------------------

    Attachment: ObjectCache.patch

Indeed, after investigating more I found that the problem is in ObjectCache or strictly speaking in how it is being used. It turns out that ObjectCache.get() is called with multiple copies of Configuration which results in creating multiple copies of filters.

I was able to avoid OOM exception in all mappers by changing ObjectCache to use Configuration.toString() as CACHE key instead of Configuration. Changing CACHE into an instance of ObjectCache (that is common for all Configuration) also works, but in this case weak references are eliminated and the CACHE is never cleared. For that reason the first approach might be better. 

More investigation might reveal why multiple Configuration are being passed to ObjectCache, but for the time being I am using a modified ObjectCache (patch attached).


> OutOfMemoryError in Mappers
> ---------------------------
>
>                 Key: NUTCH-1746
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1746
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator, injector
>    Affects Versions: 1.7
>         Environment: Nutch running in local mode with 4M+ domains in domain-urlfilter.txt
>            Reporter: Greg Padiasek
>         Attachments: Generator.patch, Injector.patch, ObjectCache.patch, domain-urlfilter-aa, domain-urlfilter-ab, domain-urlfilter-ac
>
>
> Initially I found that Generator was throwing OutOfMemoryError exception no matter how much RAM I allocated to JVM. I fixed the problem by moving URLFilters, URLNormalizers and ScoringFilters to top-level class as singletons and re-using them in all Generator mapper instances.
> Then I found the same problem in Injector and applied analogical fix.
> Now it seems that this issue may be common in all Nutch Mapper implementations.
> I was wondering if it would it be possible to integrate this kind of change
> in the upstream code base and potentially update all vulnerable Mapper classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)