You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "zou.cl" <zo...@neusoft.com> on 2011/11/18 07:39:26 UTC

OutofMemory problem in ClusterDumper

Hi guys,

     I just noticed the out of memory problem in the ClusterDumper class. It seems that it loads all the data (for example, the clusteredPoints) into the Map container which cost huge memory if we have GBs data. I think we could also use Mapreduce to print the results instead of loading all into memory.








zou.cl via foxmail
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: Re: OutofMemory problem in ClusterDumper

Posted by "zou.cl" <zo...@neusoft.com>.

Thank you very much for your help, I will check it out




zou.cl via foxmail

Sender: Paritosh Ranjan
Date: 2011年11月18日(星期五) 下午3:16
To: user
Subject: Re: OutofMemory problem in ClusterDumper
We are trying to create a cluster output post processor which will write 
cluster specific data.
You can apply the latest patch available on 
https://issues.apache.org/jira/browse/MAHOUT-843 and use 
ClusterOutputPostProcessor's distribute method. You won't get 
outofmemory there. If this is what you want.

Paritosh

On 18-11-2011 12:09, zou.cl wrote:
> Hi guys,
>
>       I just noticed the out of memory problem in the ClusterDumper class. It seems that it loads all the data (for example, the clusteredPoints) into the Map container which cost huge memory if we have GBs data. I think we could also use Mapreduce to print the results instead of loading all into memory.
>
>
>
>
>
>
>
>
> zou.cl via foxmail
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4022 - Release Date: 11/17/11
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: OutofMemory problem in ClusterDumper

Posted by Paritosh Ranjan <pr...@xebia.com>.

We are trying to create a cluster output post processor which will write 
cluster specific data.
You can apply the latest patch available on 
https://issues.apache.org/jira/browse/MAHOUT-843 and use 
ClusterOutputPostProcessor's distribute method. You won't get 
outofmemory there. If this is what you want.

Paritosh

On 18-11-2011 12:09, zou.cl wrote:
> Hi guys,
>
>       I just noticed the out of memory problem in the ClusterDumper class. It seems that it loads all the data (for example, the clusteredPoints) into the Map container which cost huge memory if we have GBs data. I think we could also use Mapreduce to print the results instead of loading all into memory.
>
>
>
>
>
>
>
>
> zou.cl via foxmail
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4022 - Release Date: 11/17/11