You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (JIRA)" <ji...@apache.org> on 2014/05/19 23:07:38 UTC

[jira] [Commented] (NUTCH-1746) OutOfMemoryError in Mappers

    [ https://issues.apache.org/jira/browse/NUTCH-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002386#comment-14002386 ] 

Julien Nioche commented on NUTCH-1746:
--------------------------------------

bq. mapred.child.java.opts=-Xmx1024m
bq. Does it matter when running in local mode?

See [http://lucene.472066.n3.nabble.com/Out-of-heap-memory-on-175K-links-in-local-mode-td3760326.html] mapred.child.java.opts is not used in local mode, setting this value won't affect the amount of memory available.

The automaton URLFilter would be more efficient indeed but your suggestion is definitely worth considering. Before we go any further I'm a bit puzzled by this as in theory it should not be an issue as these things are loaded via the plugins mechanism which holds a cache of instantiated objects. This means that regardless if the number declaring a URLFilter for instance there should be only one actual instance of said filter being used. I see that you found this to be an issue on 1.7 and am wondering whether this has not been fixed in 1.8 e.g. in [https://issues.apache.org/jira/browse/NUTCH-356].

Greg - would you mind giving it a try on 1.8 to make sure that it wasn't caused by the cache leaking or something similar?

As for your patch, I am not sure how the changes to the Injector. There is exactly one JVM per mapper or reducer instance so moving the fields to static won't change much -> there will be one instance used anyway. 









> OutOfMemoryError in Mappers
> ---------------------------
>
>                 Key: NUTCH-1746
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1746
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator, injector
>    Affects Versions: 1.7
>         Environment: Nutch running in local mode with 4M+ domains in domain-urlfilter.txt
>            Reporter: Greg Padiasek
>         Attachments: Generator.patch, Injector.patch, domain-urlfilter-aa, domain-urlfilter-ab, domain-urlfilter-ac
>
>
> Initially I found that Generator was throwing OutOfMemoryError exception no matter how much RAM I allocated to JVM. I fixed the problem by moving URLFilters, URLNormalizers and ScoringFilters to top-level class as singletons and re-using them in all Generator mapper instances.
> Then I found the same problem in Injector and applied analogical fix.
> Now it seems that this issue may be common in all Nutch Mapper implementations.
> I was wondering if it would it be possible to integrate this kind of change
> in the upstream code base and potentially update all vulnerable Mapper classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)