You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "mahmood (JIRA)" <ji...@apache.org> on 2014/03/18 18:41:46 UTC

[jira] [Issue Comment Deleted] (MAHOUT-1456) The wikipediaXMLSplitter example fails with "heap size" error

     [ https://issues.apache.org/jira/browse/MAHOUT-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

mahmood updated MAHOUT-1456:
----------------------------

    Comment: was deleted

(was: In that pastbin link, I see that only the last command produces the heap error size and that command is totally different from mine (or the manout example in mahout docs).  
Can you please test the wikipedia example with hadoop 1.2.1 and 2.1.0-beta to see the difference? )

> The wikipediaXMLSplitter example fails with "heap size" error
> -------------------------------------------------------------
>
>                 Key: MAHOUT-1456
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1456
>             Project: Mahout
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.9
>         Environment: Solaris 11.1 \
> Hadoop 2.3.0 \
> Maven 3.2.1 \
> JDK 1.7.0_07-b10 \
>            Reporter: mahmood
>              Labels: Heap,, mahout,, wikipediaXMLSplitter
>
> 1- The XML file is http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
> 2- When I run "mahout wikipediaXMLSplitter -d enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64", it stuck at chunk #571 and after 30 minutes it fails to continue with the java heap size error. Previous chunks are created rapidly (10 chunks per second).
> 3- Increasing the heap size via "-Xmx4096m" option doesn't work.
> 4- No matter what is the configuration, it seems that there is a memory leak that eat all space.



--
This message was sent by Atlassian JIRA
(v6.2#6252)