You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "zou.cl" <zo...@neusoft.com> on 2011/11/18 07:39:26 UTC
OutofMemory problem in ClusterDumper
Hi guys,
I just noticed the out of memory problem in the ClusterDumper class. It seems that it loads all the data (for example, the clusteredPoints) into the Map container which cost huge memory if we have GBs data. I think we could also use Mapreduce to print the results instead of loading all into memory.
zou.cl via foxmail
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------
Re: Re: OutofMemory problem in ClusterDumper
Posted by "zou.cl" <zo...@neusoft.com>.
Thank you very much for your help, I will check it out
zou.cl via foxmail
Sender: Paritosh Ranjan
Date: 2011年11月18日(星期五) 下午3:16
To: user
Subject: Re: OutofMemory problem in ClusterDumper
We are trying to create a cluster output post processor which will write
cluster specific data.
You can apply the latest patch available on
https://issues.apache.org/jira/browse/MAHOUT-843 and use
ClusterOutputPostProcessor's distribute method. You won't get
outofmemory there. If this is what you want.
Paritosh
On 18-11-2011 12:09, zou.cl wrote:
> Hi guys,
>
> I just noticed the out of memory problem in the ClusterDumper class. It seems that it loads all the data (for example, the clusteredPoints) into the Map container which cost huge memory if we have GBs data. I think we could also use Mapreduce to print the results instead of loading all into memory.
>
>
>
>
>
>
>
>
> zou.cl via foxmail
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4022 - Release Date: 11/17/11
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this communication in error,please
immediately notify the sender by return e-mail, and delete the original message and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------
Re: OutofMemory problem in ClusterDumper
Posted by Paritosh Ranjan <pr...@xebia.com>.
We are trying to create a cluster output post processor which will write
cluster specific data.
You can apply the latest patch available on
https://issues.apache.org/jira/browse/MAHOUT-843 and use
ClusterOutputPostProcessor's distribute method. You won't get
outofmemory there. If this is what you want.
Paritosh
On 18-11-2011 12:09, zou.cl wrote:
> Hi guys,
>
> I just noticed the out of memory problem in the ClusterDumper class. It seems that it loads all the data (for example, the clusteredPoints) into the Map container which cost huge memory if we have GBs data. I think we could also use Mapreduce to print the results instead of loading all into memory.
>
>
>
>
>
>
>
>
> zou.cl via foxmail
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------
>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/4022 - Release Date: 11/17/11