You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yunming Zhang <zh...@gmail.com> on 2013/01/03 17:02:59 UTC

questions about the algorithms offered by Mahout

Hi, 

I am trying to find an application within the Mahout package with the following characteristics, I have developed a mapper that should optimize the performance of applications with a lot of stragglers and has large memory footprint. I am not an expert in Machine learning applications, so any suggestion would be greatly appreciated!

1. compute intensive (the running time of the application is not dominant by IO operations)
2. unbalanced workload for each mapper. Some Mapper (Child JVM) might take a lot longer than others, creating a lot of stragglers)
3. It has a lot of iterations. I guess this is a common characteristic of most of the Mahout applications?


Other than the top 3, one important but not necessary characteristic that I am looking for in the application is that it has large memory footprint for each JVM. This is because I am trying to find applications that can't afford to spawn 12 mappers (JVMs) because common 8GB machines will not be able to provide the memory to support 12 JVMs simultaneously. So far I haven't had much luck finding an application with this characteristic, so I am placing more emphasis on the first three. 

I have been testing with kmeans application in Mahout package, but it does have a lot of computation and iterations, however, the workload of each mapper JVM is fairly balanced and the memory footprint is not large enough to prevent user from spawning 12 JVMs for full utilization of CPU, 

Thanks

Yunming