You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Xin Zhang (JIRA)" <ji...@apache.org> on 2015/03/29 08:49:52 UTC
[jira] [Commented] (NUTCH-1977) commoncrawldump java heap space
[ https://issues.apache.org/jira/browse/NUTCH-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385651#comment-14385651 ]
Xin Zhang commented on NUTCH-1977:
----------------------------------
Did you try to improve the java heap size?
You can check /src/bin/nutch and adjust proper java heap space.
> commoncrawldump java heap space
> -------------------------------
>
> Key: NUTCH-1977
> URL: https://issues.apache.org/jira/browse/NUTCH-1977
> Project: Nutch
> Issue Type: Bug
> Components: commoncrawl
> Affects Versions: 1.10
> Reporter: Jiaheng Zhang
> Priority: Minor
> Fix For: 1.10
>
>
> When using the commoncrawldump component, we get the error:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3236)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
> at com.fasterxml.jackson.dataformat.cbor.CBORGenerator._flushBuffer(CBORGenerator.java:1365)
> at com.fasterxml.jackson.dataformat.cbor.CBORGenerator.close(CBORGenerator.java:896)
> at org.apache.nutch.tools.CommonCrawlDataDumper.serializeCBORData(CommonCrawlDataDumper.java:461)
> at org.apache.nutch.tools.CommonCrawlDataDumper.dump(CommonCrawlDataDumper.java:375)
> at org.apache.nutch.tools.CommonCrawlDataDumper.main(CommonCrawlDataDumper.java:256)
> and
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2367)
> at java.lang.StringCoding.safeTrim(StringCoding.java:89)
> at java.lang.StringCoding.access$100(StringCoding.java:50)
> at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:154)
> at java.lang.StringCoding.decode(StringCoding.java:193)
> at java.lang.StringCoding.decode(StringCoding.java:254)
> at java.lang.String.<init>(String.java:536)
> at java.io.ByteArrayOutputStream.toString(ByteArrayOutputStream.java:208)
> at org.apache.nutch.tools.CommonCrawlFormatJackson.generateJson(CommonCrawlFormatJackson.java:80)
> at org.apache.nutch.tools.AbstractCommonCrawlFormat.getJsonData(AbstractCommonCrawlFormat.java:121)
> at org.apache.nutch.tools.CommonCrawlDataDumper.dump(CommonCrawlDataDumper.java:361)
> at org.apache.nutch.tools.CommonCrawlDataDumper.main(CommonCrawlDataDumper.java:256)
> The segment files' size is 1.41GB. However we successfully dump the files with the segments' size of 100M.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)