You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2014/03/09 09:08:33 UTC
Heap space
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M" shows that there is no such option. What should I do?
Regards,
Mahmood
RE: Heap space
Posted by Jason Xin <Ja...@sas.com>.
Hello, Sebastian,
Can you help me remove my email account from this list? I tried several time with unsubscribe but to no vail. Thanks.
The email is Jason.Xin@sas.com
Best Regards
Jason Xin
-----Original Message-----
From: Sebastian Schelter [mailto:ssc@apache.org]
Sent: Sunday, March 09, 2014 4:26 PM
To: user@mahout.apache.org
Subject: Re: Heap space
I usually do try and error. Start with some very large value and do a binary search :)
--sebastian
On 03/09/2014 01:30 PM, Mahmood Naderan wrote:
> Excuse me, I added the -Xmx option and restarted the hadoop services
> using sbin/stop-all.sh && sbin/start-all.sh
>
> however still I get heap size error. How can I find the correct and needed heap size?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
>
> OK I found that I have to add this property to mapred-site.xml
>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2048m</value>
> </property>
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
>
> Hello,
> I ran this command
>
> ./bin/mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c
> 64
>
> but got this error
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap
> space
>
> There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M" shows that there is no such option. What should I do?
>
>
> Regards,
> Mahmood
>
Re: Heap space
Posted by Sebastian Schelter <ss...@apache.org>.
I usually do try and error. Start with some very large value and do a
binary search :)
--sebastian
On 03/09/2014 01:30 PM, Mahmood Naderan wrote:
> Excuse me, I added the -Xmx option and restarted the hadoop services using
> sbin/stop-all.sh && sbin/start-all.sh
>
> however still I get heap size error. How can I find the correct and needed heap size?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
>
> OK I found that I have to add this property to mapred-site.xml
>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2048m</value>
> </property>
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
>
> Hello,
> I ran this command
>
> ./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> but got this error
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
> There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M" shows that there is no such option. What should I do?
>
>
> Regards,
> Mahmood
>
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
Thanks. Let me post a new thread on both lists with details
Regards,
Mahmood
On Tuesday, March 11, 2014 10:24 PM, Andrew Musselman <an...@gmail.com> wrote:
Mahmood, just an observation and reminder, Suneel's not the only one on the
list. We're all here to help.
This may be a question for a Hadoop list, unless I'm misunderstanding.
When you say "resume" what do you mean?
On Tue, Mar 11, 2014 at 11:46 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> Suneel,
> One more thing.... Right now it has created 500 chunks. So 32GB out of
> 48GB (the original size of XML file) has been processed. Is it possible to
> resume that?
>
>
>
> Regards,
> Mahmood
>
>
>
> On Tuesday, March 11, 2014 9:47 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Suneel,
> Is it possible to create some king of parallelism in the process of
> creating the chunks in order to divide the resources to smaller pieces?
>
> Let me explain in this way. Assume with one thread needs 20GB of heap and
> my system cannot afford that. So I will divide that to 10 threads each
> needs 2GB.
> If my system supports 10GB of heap, then I will feed 5 threads at one
> time. When the first 5 threads are done (the chunks) then I will feed the
> next 5 threads and so on.
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Monday, March 10, 2014 9:42 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> UPDATE:
> I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5
> minutes
>
> Regards,
> Mahmood
>
>
>
> On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
> With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds
> to create 64MB chunks.
> I was able to see 15 chunks with "hadoop dfs -ls".
>
> P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
> $HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
> Is that necessary?
>
>
> Regards,
> Mahmood
>
>
>
>
> On Monday, March 10, 2014 5:30 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Morning Mahmood,
>
> Please first try running this on a smaller dataset like
> 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
> english wikipedia.
>
>
>
>
>
>
>
> On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Thanks for the update.
> Thing is, when that command is running, in another terminal I run 'top'
> command and I see that the java process takes less 1GB of memory. As
> another test, I
> increased the size of memory to 48GB (since I am working with virtualbox)
> and set the heap size to -Xmx45000m
>
> Still I get the heap error.
>
>
> I
> expect that there should be a more meaningful error message that *who*
> needs more heap size? Hadoop, Mahout, Java, ....?
>
>
> Regards,
> Mahmood
>
>
>
>
> On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mahmood,
>
> Firstly thanks for starting this email thread and for
> highlighting the issues with
> wikipedia example. Since you raised this issue, I updated the new
> wikipedia examples page at
> http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
> and also responded to a similar question on StackOverFlow at
>
> http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839
> .
>
> I am assuming that u r running this locally on ur machine and r just
> trying out the examples. Try out Sebastian's suggestion or else try running
> the example on a much smaller dataset of wikipedia articles.
>
>
> Lastly, w do realize that u have been struggling with this for about 3
> days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in
> driver.classes.default.props. Not sure at what point in time and which
> release that had happened.
>
> Please file a Jira for
> this and submit a patch.
>
>
>
>
>
> On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi Suneel,
> Do you have any idea? Searching the web shows many question regarding the
> heap size for wikipediaXMLSplitter. I have increased the the memory size to
> 16GB and still get that error. I have to say that using 'top' command, I
> see only 1GB of memory is in use. So I wonder why it report such an error.
> Is this a problem with Java, Mahout, Hadoop, ..?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan
> <nt...@yahoo.com> wrote:
>
> Excuse me, I added the -Xmx option and restarted the
> hadoop services using
> sbin/stop-all.sh && sbin/start-all.sh
>
> however still I get heap size error. How can I find the correct and needed
> heap size?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> OK I found that I have to add this property to mapred-site.xml
>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2048m</value>
> </property>
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hello,
> I ran this command
>
> ./bin/mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> but got this error
> Exception in thread "main"
> java.lang.OutOfMemoryError: Java heap space
>
> There are many web pages regarding this and the solution is to add "-Xmx
> 2048M"
> for example. My question is, that option should be passed to java command
> and not Mahout. As result,
> running "./bin/mahout -Xmx 2048M"
> shows that there is no such option. What should I
> do?
>
>
> Regards,
> Mahmood
>
Re: Heap space
Posted by Andrew Musselman <an...@gmail.com>.
Mahmood, just an observation and reminder, Suneel's not the only one on the
list. We're all here to help.
This may be a question for a Hadoop list, unless I'm misunderstanding.
When you say "resume" what do you mean?
On Tue, Mar 11, 2014 at 11:46 AM, Mahmood Naderan <nt...@yahoo.com>wrote:
> Suneel,
> One more thing.... Right now it has created 500 chunks. So 32GB out of
> 48GB (the original size of XML file) has been processed. Is it possible to
> resume that?
>
>
>
> Regards,
> Mahmood
>
>
>
> On Tuesday, March 11, 2014 9:47 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Suneel,
> Is it possible to create some king of parallelism in the process of
> creating the chunks in order to divide the resources to smaller pieces?
>
> Let me explain in this way. Assume with one thread needs 20GB of heap and
> my system cannot afford that. So I will divide that to 10 threads each
> needs 2GB.
> If my system supports 10GB of heap, then I will feed 5 threads at one
> time. When the first 5 threads are done (the chunks) then I will feed the
> next 5 threads and so on.
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Monday, March 10, 2014 9:42 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> UPDATE:
> I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5
> minutes
>
> Regards,
> Mahmood
>
>
>
> On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
> With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds
> to create 64MB chunks.
> I was able to see 15 chunks with "hadoop dfs -ls".
>
> P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
> $HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
> Is that necessary?
>
>
> Regards,
> Mahmood
>
>
>
>
> On Monday, March 10, 2014 5:30 PM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Morning Mahmood,
>
> Please first try running this on a smaller dataset like
> 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire
> english wikipedia.
>
>
>
>
>
>
>
> On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Thanks for the update.
> Thing is, when that command is running, in another terminal I run 'top'
> command and I see that the java process takes less 1GB of memory. As
> another test, I
> increased the size of memory to 48GB (since I am working with virtualbox)
> and set the heap size to -Xmx45000m
>
> Still I get the heap error.
>
>
> I
> expect that there should be a more meaningful error message that *who*
> needs more heap size? Hadoop, Mahout, Java, ....?
>
>
> Regards,
> Mahmood
>
>
>
>
> On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com>
> wrote:
>
> Mahmood,
>
> Firstly thanks for starting this email thread and for
> highlighting the issues with
> wikipedia example. Since you raised this issue, I updated the new
> wikipedia examples page at
> http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
> and also responded to a similar question on StackOverFlow at
>
> http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839
> .
>
> I am assuming that u r running this locally on ur machine and r just
> trying out the examples. Try out Sebastian's suggestion or else try running
> the example on a much smaller dataset of wikipedia articles.
>
>
> Lastly, w do realize that u have been struggling with this for about 3
> days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in
> driver.classes.default.props. Not sure at what point in time and which
> release that had happened.
>
> Please file a Jira for
> this and submit a patch.
>
>
>
>
>
> On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hi Suneel,
> Do you have any idea? Searching the web shows many question regarding the
> heap size for wikipediaXMLSplitter. I have increased the the memory size to
> 16GB and still get that error. I have to say that using 'top' command, I
> see only 1GB of memory is in use. So I wonder why it report such an error.
> Is this a problem with Java, Mahout, Hadoop, ..?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan
> <nt...@yahoo.com> wrote:
>
> Excuse me, I added the -Xmx option and restarted the
> hadoop services using
> sbin/stop-all.sh && sbin/start-all.sh
>
> however still I get heap size error. How can I find the correct and needed
> heap size?
>
>
> Regards,
> Mahmood
>
>
>
> On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> OK I found that I have to add this property to mapred-site.xml
>
>
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx2048m</value>
> </property>
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com>
> wrote:
>
> Hello,
> I ran this command
>
> ./bin/mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> but got this error
> Exception in thread "main"
> java.lang.OutOfMemoryError: Java heap space
>
> There are many web pages regarding this and the solution is to add "-Xmx
> 2048M"
> for example. My question is, that option should be passed to java command
> and not Mahout. As result,
> running "./bin/mahout -Xmx 2048M"
> shows that there is no such option. What should I
> do?
>
>
> Regards,
> Mahmood
>
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
Suneel,
One more thing.... Right now it has created 500 chunks. So 32GB out of 48GB (the original size of XML file) has been processed. Is it possible to resume that?
Regards,
Mahmood
On Tuesday, March 11, 2014 9:47 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Suneel,
Is it possible to create some king of parallelism in the process of creating the chunks in order to divide the resources to smaller pieces?
Let me explain in this way. Assume with one thread needs 20GB of heap and my system cannot afford that. So I will divide that to 10 threads each needs 2GB.
If my system supports 10GB of heap, then I will feed 5 threads at one time. When the first 5 threads are done (the chunks) then I will feed the next 5 threads and so on.
Regards,
Mahmood
On Monday, March 10, 2014 9:42 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
UPDATE:
I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5 minutes
Regards,
Mahmood
On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds to create 64MB chunks.
I was able to see 15 chunks with "hadoop dfs -ls".
P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
$HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
Is that necessary?
Regards,
Mahmood
On Monday, March 10, 2014 5:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
Morning Mahmood,
Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia.
On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test, I
increased the size of memory to 48GB (since I am working with virtualbox) and set the heap size to -Xmx45000m
Still I get the heap error.
I
expect that there should be a more meaningful error message that *who* needs more heap size? Hadoop, Mahout, Java, ....?
Regards,
Mahmood
On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with
wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at
http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for
this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan
<nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the
hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M"
for example. My question is, that option should be passed to java command and not Mahout. As result,
running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I
do?
Regards,
Mahmood
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
Suneel,
Is it possible to create some king of parallelism in the process of creating the chunks in order to divide the resources to smaller pieces?
Let me explain in this way. Assume with one thread needs 20GB of heap and my system cannot afford that. So I will divide that to 10 threads each needs 2GB.
If my system supports 10GB of heap, then I will feed 5 threads at one time. When the first 5 threads are done (the chunks) then I will feed the next 5 threads and so on.
Regards,
Mahmood
On Monday, March 10, 2014 9:42 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
UPDATE:
I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5 minutes
Regards,
Mahmood
On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds to create 64MB chunks.
I was able to see 15 chunks with "hadoop dfs -ls".
P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
$HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
Is that necessary?
Regards,
Mahmood
On Monday, March 10, 2014 5:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
Morning Mahmood,
Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia.
On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test, I
increased the size of memory to 48GB (since I am working with virtualbox) and set the heap size to -Xmx45000m
Still I get the heap error.
I
expect that there should be a more meaningful error message that *who* needs more heap size? Hadoop, Mahout, Java, ....?
Regards,
Mahmood
On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with
wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at
http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the
hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M"
for example. My question is, that option should be passed to java command and not Mahout. As result,
running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I
do?
Regards,
Mahmood
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
UPDATE:
I split another 5.4GB XML file with 4GB of RAM and -Xmx128m and it took 5 minutes
Regards,
Mahmood
On Monday, March 10, 2014 7:16 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds to create 64MB chunks.
I was able to see 15 chunks with "hadoop dfs -ls".
P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
$HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
Is that necessary?
Regards,
Mahmood
On Monday, March 10, 2014 5:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
Morning Mahmood,
Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia.
On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test, I
increased the size of memory to 48GB (since I am working with virtualbox) and set the heap size to -Xmx45000m
Still I get the heap error.
I expect that there should be a more meaningful error message that *who* needs more heap size? Hadoop, Mahout, Java, ....?
Regards,
Mahmood
On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with
wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at
http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the
hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result,
running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I
do?
Regards,
Mahmood
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
The extracted size is about 960MB (enwiki-latest-pages-articles10.xml).
With 4GB of RAM set for the OS and -Xmx128m for Hadoop, it took 77 seconds to create 64MB chunks.
I was able to see 15 chunks with "hadoop dfs -ls".
P.S: Whenever I modify -Xmx value in mapred-site.xml, I run
$HADOOP/sbin/stop-all.sh && $HADOOP/sbin/start-all.sh
Is that necessary?
Regards,
Mahmood
On Monday, March 10, 2014 5:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
Morning Mahmood,
Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia.
On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test, I
increased the size of memory to 48GB (since I am working with virtualbox) and set the heap size to -Xmx45000m
Still I get the heap error.
I expect that there should be a more meaningful error message that *who* needs more heap size? Hadoop, Mahout, Java, ....?
Regards,
Mahmood
On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with
wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at
http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the
hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result,
running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I
do?
Regards,
Mahmood
Re: Heap space
Posted by Suneel Marthi <su...@yahoo.com>.
Morning Mahmood,
Please first try running this on a smaller dataset like 'enwiki-latest-pages-articles10.xml' as opposed to running on the entire english wikipedia.
On Monday, March 10, 2014 2:59 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test, I increased the size of memory to 48GB (since I am working with virtualbox) and set the heap size to -Xmx45000m
Still I get the heap error.
I expect that there should be a more meaningful error message that *who* needs more heap size? Hadoop, Mahout, Java, ....?
Regards,
Mahmood
On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at
http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I
do?
Regards,
Mahmood
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
Thanks for the update.
Thing is, when that command is running, in another terminal I run 'top' command and I see that the java process takes less 1GB of memory. As another test, I increased the size of memory to 48GB (since I am working with virtualbox) and set the heap size to -Xmx45000m
Still I get the heap error.
I expect that there should be a more meaningful error message that *who* needs more heap size? Hadoop, Mahout, Java, ....?
Regards,
Mahmood
On Monday, March 10, 2014 1:31 AM, Suneel Marthi <su...@yahoo.com> wrote:
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I do?
Regards,
Mahmood
Re: Heap space
Posted by Suneel Marthi <su...@yahoo.com>.
Mahmood,
Firstly thanks for starting this email thread and for
highlighting the issues with wikipedia example. Since you raised this issue, I updated the new wikipedia examples page at
http://mahout.apache.org/users/classification/wikipedia-bayes-example.html
and also responded to a similar question on StackOverFlow at http://stackoverflow.com/questions/19505422/mahout-error-when-try-out-wikipedia-examples/22286839#22286839.
I am assuming that u r running this locally on ur machine and r just trying out the examples. Try out Sebastian's suggestion or else try running the example on a much smaller dataset of wikipedia articles.
Lastly, w do realize that u have been struggling with this for about 3 days now. Mahout presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props. Not sure at what point in time and which release that had happened.
Please file a Jira for this and submit a patch.
On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter. I have increased the the memory size to 16GB and still get that error. I have to say that using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an error.
Is this a problem with Java, Mahout, Hadoop, ..?
Regards,
Mahmood
On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
Excuse me, I added the -Xmx option and restarted the hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M"
shows that there is no such option. What should I do?
Regards,
Mahmood
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
Excuse me, I added the -Xmx option and restarted the hadoop services using
sbin/stop-all.sh && sbin/start-all.sh
however still I get heap size error. How can I find the correct and needed heap size?
Regards,
Mahmood
On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M" shows that there is no such option. What should I do?
Regards,
Mahmood
Re: Heap space
Posted by Mahmood Naderan <nt...@yahoo.com>.
OK I found that I have to add this property to mapred-site.xml
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2048m</value>
</property>
Regards,
Mahmood
On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <nt...@yahoo.com> wrote:
Hello,
I ran this command
./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
but got this error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example. My question is, that option should be passed to java command and not Mahout. As result, running "./bin/mahout -Xmx 2048M" shows that there is no such option. What should I do?
Regards,
Mahmood