You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by myn <my...@163.com> on 2011/08/29 12:49:42 UTC

why not change the clusterID from int to long

why not change the clusterID from int to long
I have a data about 30 billion rows,when i used createCanopyFromVectors in meanshift.
the clusterid,is not big enough.
second  ,in MeanShiftCanopyCreatorMapper class,     
 nextCanopyId = ((1 << 31) / 50000) * (Integer.parseInt(parts[4])%50000); in setup function
means on map only have 40000 ids, That is not big enough, hadoop default block size is 64M ,somt times it will more then 50000rows